open-source
50 articles tagged with open-source
Trail of Bits and OpenAI's Daybreak initiative produce 64 pull requests across 19 open-source projects in one week using
Trail of Bits launched Patch the Planet, a security initiative using OpenAI's GPT-5.5-Cyber model to find and fix bugs in critical open-source projects. The first week produced 64 pull requests and 51 issues across 19 projects including cURL, Python, PyPI, and Sigstore, with 37 patches already merged.
Baidu Releases Unlimited-OCR, a 3B Parameter Document Parsing Model Based on Deepseek-OCR
Baidu has released Unlimited-OCR, a 3 billion parameter model for optical character recognition and document parsing. The model supports single-page and multi-page document processing with a 32,768 token context window and runs on NVIDIA GPUs using bfloat16 precision.
Poolside releases Laguna M.1: 225B parameter MoE model scores 74.6% on SWE-bench Verified
Poolside has released Laguna M.1, a 225B total parameter Mixture-of-Experts model with 23B activated parameters per token, designed for agentic coding tasks. The model scores 74.6% on SWE-bench Verified and 63.1% on SWE-bench Multilingual, released under Apache 2.0 license.
Mistral releases Leanstral, open-source 6B-parameter proof assistant for Lean 4 under Apache 2.0
Mistral AI has released Leanstral, a sparse 120B model with 6B active parameters designed specifically for the Lean 4 proof assistant. The model is available under Apache 2.0 license with free API access and achieves a 26.3 FLTEval score at pass@2, outperforming Claude Sonnet 4.6 while costing $36 versus $549.
Mistral Releases Mistral 3 Family: 675B-Parameter Large 3 MoE and Three Edge Models Under Apache 2.0
Mistral has released Mistral 3, including Mistral Large 3—a sparse mixture-of-experts model with 41B active and 675B total parameters—and three Ministral 3 edge models (3B, 8B, 14B). All models are released under Apache 2.0 license with multimodal capabilities and are available today on multiple platforms.
Zhipu AI releases GLM-5.2 with 1M token context and 62.1% SWE-bench Pro score
Zhipu AI released GLM-5.2, a 753 billion parameter model with a 1 million token context window. The model scores 62.1% on SWE-bench Pro and introduces IndexShare architecture that reduces per-token FLOPs by 2.9× at 1M context length. Released under MIT license with no regional restrictions.
Z.AI releases GLM-5.2 with 1M token context, outperforms GPT-5.5 on long-horizon coding benchmarks
Z.AI has released GLM-5.2, an open-source model with a 1M-token context window under an MIT license. On FrontierSWE, a long-horizon coding benchmark, GLM-5.2 trails Claude Opus 4.8 by 1% while outperforming GPT-5.5 by 1%, and achieves 81.0 on Terminal-Bench 2.1 compared to Opus 4.8's 85.0.
GLM-5.2 Released with 1M Token Context and 753B Parameters Under MIT License
Zhipu AI has released GLM-5.2, a 753 billion parameter model featuring a 1 million token context window and MIT open-source license. The model scores 62.1% on SWE-bench Pro and 91.2% on GPQA-Diamond, with flexible reasoning effort levels for coding tasks.
Microsoft Releases FastContext-1.0: 4B-Parameter Repository Explorer Cuts Coding Agent Token Use by 60%
Microsoft released FastContext-1.0, a lightweight repository-exploration subagent for LLM coding agents spanning 4B to 30B parameters. The model reduced main-agent token consumption by up to 60% while improving end-to-end resolution rates by up to 5.5% on SWE-bench Pro when integrated with agents like GPT-5.4 and GLM-5.1.
Google DeepMind releases Gemma 4 12B: encoder-free multimodal model runs on 16GB RAM
Google DeepMind has released Gemma 4 12B, a 12-billion parameter multimodal model that runs locally on laptops with 16GB of RAM. The model eliminates separate vision and audio encoders, processing raw inputs directly through its language model backbone under an Apache 2.0 license.
AWS releases open-source test harness for evaluating Amazon Nova Sonic voice agents at scale
Amazon has released an open-source testing framework for Nova Sonic voice agents that automates multi-turn conversation evaluation without requiring human testers. The harness uses LLM-as-judge techniques to assess voice agents across six metrics including goal achievement, response accuracy, and tool usage, addressing a critical QA bottleneck in voice AI development.
Google DeepMind releases Gemma 4 12B Unified: encoder-free multimodal model with 256K context window
Google DeepMind has released Gemma 4 12B Unified, an encoder-free multimodal model that processes text, images, and audio through a single decoder-only transformer. The model features 11.95 billion parameters, a 256K token context window, and achieves 77.2% on MMLU Pro and 72.0% on LiveCodeBench v6.
ByteDance Open-Sources Bernini-R Video Diffusion Model With Semantic Planning Architecture
ByteDance released Bernini-R, an open-source video generation and editing model that combines an MLLM-based semantic planner with a DiT-based renderer. The model requires Hopper-class GPUs (H100/H800/H200) for optimal performance and supports multiple tasks including text-to-video, video editing, and reference-guided generation.
JetBrains Releases Mellum2-12B Reasoning Model with 131K Context and Mixture-of-Experts Architecture
JetBrains has released Mellum2-12B-A2.5B-Thinking, a reasoning-augmented assistant model with 131,072-token context window and 64 Mixture-of-Experts architecture that activates 8 experts per token. The model emits explicit chain-of-thought reasoning inside <think> blocks before providing final answers.
NVIDIA Releases Cosmos 3: 8B and 32B Omni-Models Combining Video Generation, Reasoning, and Action in Single Architectur
NVIDIA has released Cosmos 3, a unified omni-model that combines world generation, physical reasoning, and action generation in a single architecture. Available in 8B (Nano) and 32B (Super) parameter versions on Hugging Face, Cosmos 3 uses a Mixture-of-Transformers architecture to process text, image, video, audio, and action modalities without switching between separate models.
Mistral AI Releases Small 4: 119B Parameter Open-Source Model with 256K Context Under Apache 2.0
Mistral AI has released Mistral Small 4, a 119B total parameter mixture-of-experts model with 256K context window and native multimodal capabilities. The model uses 128 experts with 4 active per token (6B active parameters) and is released under the Apache 2.0 license, marking Mistral's first unified model combining reasoning, multimodal, and coding capabilities.
Mistral releases Leanstral, 6B-parameter open-source model for Lean 4 formal proof verification
Mistral AI released Leanstral, the first open-source code agent designed specifically for Lean 4 formal proof verification. The model uses 6B active parameters in a sparse 120B architecture and is available under Apache 2.0 license with free API access.
Mistral Releases Mistral Large 3 with 675B Parameters and Three Ministral 3 Models Under Apache 2.0
Mistral AI has released Mistral 3, consisting of Mistral Large 3—a sparse mixture-of-experts model with 675B total parameters and 41B active parameters—and three Ministral 3 models at 3B, 8B, and 14B parameters. All models are released under the Apache 2.0 license with multimodal capabilities including image understanding.
Mistral AI Releases Voxtral: Apache 2.0 Speech Models with 32K Token Context at $0.001/Minute
Mistral AI released Voxtral, a family of open-source speech understanding models available in 24B and 3B parameter variants under Apache 2.0 license. The models support up to 32K token context (30 minutes of audio for transcription, 40 minutes for understanding) and are priced at $0.001 per minute via API—less than half the cost of comparable proprietary systems according to Mistral.
Mistral releases Devstral Medium and Small 1.1 with 61.6% SWE-Bench Verified score
Mistral AI has released two specialized coding models: Devstral Medium, achieving 61.6% on SWE-Bench Verified, and Devstral Small 1.1, scoring 53.6% and released under Apache 2.0 license. The company claims Devstral Medium surpasses Gemini 2.5 Pro and GPT-4.1 at a quarter of the price.
Mistral AI Releases Magistral Reasoning Models: 24B Open-Source and Enterprise Versions Score 70.7% and 73.6% on AIME202
Mistral AI has released Magistral, its first reasoning model line, in two versions: Magistral Small (24B parameters, Apache 2.0) and Magistral Medium (enterprise). Magistral Medium scored 73.6% on AIME2024 (90% with majority voting at 64 samples), while the open-source Small version achieved 70.7% (83.3% with voting).
Tencent Releases Hy-MT2 Translation Models: 1.8B, 7B, and 30B-A3B Support 33 Languages
Tencent released Hy-MT2, a family of multilingual translation models available in 1.8B, 7B, and 30B-A3B (MoE) sizes. All models support translation among 33 languages and follow translation instructions in multiple languages. The 1.8B model can be compressed to 440MB using 1.25-bit AngelSlim quantization.
NVIDIA Releases Nemotron-Labs Diffusion Models With 6.4× Faster Token Generation Than Autoregressive Decoding
NVIDIA has released Nemotron-Labs Diffusion, a family of diffusion language models at 3B, 8B, and 14B scales that generate multiple tokens in parallel rather than one at a time. The 8B model achieves 6.4× higher tokens per forward pass than autoregressive models in self-speculation mode while maintaining comparable accuracy.
Tencent Releases Hy-MT2: 1.8B Translation Model Compressed to 440MB With 1.25-Bit Quantization
Tencent has open-sourced Hy-MT2, a family of multilingual translation models available in 1.8B, 7B, and 30B-A3B parameter sizes. The models support translation across 33 languages and include extreme quantization down to 1.25-bit, reducing the 1.8B model to 440MB storage while increasing inference speed by 1.5x.
Cohere Releases Command A+ Open Source Model with 25B Active Parameters, 128K Context
Cohere has released Command A+ as an open source model under Apache 2.0 license. The sparse mixture-of-experts architecture features 25 billion active parameters out of 218B total parameters, supports 128K input context length, and includes vision capabilities alongside tool use and reasoning features.
Cohere Releases Command A+: 218B-Parameter MoE Model With 4-Bit Quantization Runs on Single B200 GPU
Cohere has released Command A+, an open-source sparse mixture-of-experts model with 218 billion total parameters and 25 billion active parameters. The model features W4A4 quantization allowing deployment on a single Nvidia B200 GPU, supports 128K input context, and includes built-in chain-of-thought reasoning with vision capabilities.
IBM Research launches Open Agent Leaderboard, showing same models achieve different results based on agent architecture
IBM Research has launched the Open Agent Leaderboard, the first open benchmark that evaluates complete AI agent systems rather than just underlying models. The leaderboard reveals that agents using identical models can achieve significantly different success rates and costs depending on system architecture, with failed runs costing 20-54% more than successful ones.
IBM Releases 97M-Parameter Granite Embedding Model With 60.3 MTEB Score — Highest Retrieval Quality Under 100M Parameter
IBM released two new multilingual embedding models under Apache 2.0: a 97M-parameter compact model scoring 60.3 on MTEB Multilingual Retrieval (highest in its size class) and a 311M full-size model scoring 65.2. Both support 200+ languages with enhanced retrieval for 52 languages, handle 32K-token context (64x increase over predecessors), and include code retrieval across 9 programming languages.
IBM Releases Granite Embedding 311M R2 With 32K Context, 200+ Language Support
IBM released Granite Embedding 311M Multilingual R2, a 311-million parameter dense embedding model with 32,768-token context length and support for 200+ languages. The model scores 64.0 on Multilingual MTEB Retrieval (18 tasks), an 11.8-point improvement over its predecessor, and ships with ONNX and OpenVINO models for production deployment.
IBM releases Apache 2.0 Granite 4.1 LLMs in 3B, 8B, and 30B sizes
IBM has released the Granite 4.1 family of language models under Apache 2.0 license. The models come in 3B, 8B, and 30B parameter sizes. Unsloth has released 21 GGUF quantized variants of the 3B model ranging from 1.2GB to 6.34GB.
IBM Releases Granite 4.1 30B With 131K Context Window and Enhanced Tool-Calling
IBM released Granite 4.1 30B, a 30-billion parameter instruction-following model with a 131,072 token context window. The model scores 80.16 on MMLU 5-shot and 88.41 on HumanEval pass@1, with enhanced tool-calling capabilities following OpenAI's function definition schema.
IBM Releases Granite 4.1 8B with 131K Context Window at $0.05/M Input Tokens
IBM has released Granite 4.1 8B, an 8-billion-parameter decoder-only language model with a 131,072-token context window. The model supports 12 languages and costs $0.05 per million input tokens and $0.10 per million output tokens, available under the Apache 2.0 license.
IBM releases Granite 4.1-8B with 131K context window and enhanced tool-calling capabilities
IBM has released Granite 4.1-8B, an 8-billion parameter long-context model with a 131,072-token context window. The model achieves 85.37% on HumanEval and 73.84% on MMLU 5-shot, with enhanced tool-calling capabilities reaching 68.27% on BFCL v3. Released under Apache 2.0 license, it supports 12 languages.
Poolside releases Laguna XS.2: 33B parameter MoE coding model with 131K context window
Poolside has released Laguna XS.2, a 33B total parameter Mixture-of-Experts model with 3B activated parameters per token, designed for agentic coding. The model features a 131,072-token context window, scores 68.2% on SWE-bench Verified, and is available under Apache 2.0 license with free API access.
Xiaomi releases MiMo-V2.5: 310B parameter omnimodal model with 1M token context window
Xiaomi released MiMo-V2.5, a 310B total parameter sparse mixture-of-experts model that activates 15B parameters per token. The omnimodal model supports text, image, video, and audio understanding with a 1M token context window and was trained on 48T tokens using FP8 mixed precision.
Xiaomi Releases MiMo-V2.5-Pro: 1.02T Parameter MoE Model with 1M Context Window
Xiaomi has released MiMo-V2.5-Pro, an open-source Mixture-of-Experts model with 1.02 trillion total parameters and 42 billion active parameters. The model supports up to 1 million tokens context length and claims 99.6% on GSM8K and 86.2% on MATH benchmarks.
Alibaba's Qwen Team Releases Qwen3.6 27B With 262K Context Window and Video Processing
Alibaba's Qwen Team has released Qwen3.6 27B, a 27-billion parameter multimodal language model with a 262,144-token context window. The model accepts text, image, and video inputs and includes a built-in thinking mode for extended reasoning, with pricing at $0.195 per million input tokens and $1.56 per million output tokens.
DeepSeek releases V4 preview, claims parity with GPT-4o and Claude 3.5 Sonnet
DeepSeek released a preview of its V4 model on April 24, 2026, claiming the open-source system matches leading closed-source models from Anthropic, Google, and OpenAI. The company emphasized improved coding capabilities and compatibility with domestic Huawei chips, but did not disclose training costs or hardware specifications.
DeepSeek releases V4 model preview with agent optimization, pricing undisclosed
DeepSeek released a preview of its V4 large language model on April 24, 2026, available in 'pro' and 'flash' versions. The Hangzhou-based company claims the open-source model achieves strong performance on agent-based tasks and has been optimized for tools like Anthropic's Claude Code and OpenClaw.
DeepSeek Releases V4-Pro-Base with 1.6 Trillion Parameters
DeepSeek has released DeepSeek-V4-Pro-Base, a 1.6 trillion parameter foundation model now available on Hugging Face. The base model uses BF16 precision for weights and includes support for F8_E4M3, I64, and F32 tensor types.
DeepSeek Releases V4-Flash: 284B-Parameter MoE Model With 1M Token Context at 27% Inference Cost
DeepSeek released two Mixture-of-Experts models: V4-Flash with 284B total parameters (13B activated) and V4-Pro with 1.6T parameters (49B activated). Both models support one million token context windows and use a hybrid attention architecture that requires only 27% of the inference FLOPs compared to DeepSeek-V3.2 at 1M token context.
DeepSeek Releases V4-Pro: 1.6T Parameter MoE Model with 1M Token Context
DeepSeek released two new Mixture-of-Experts models: DeepSeek-V4-Pro with 1.6 trillion parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated), both supporting one million token context length. The models achieve 27% of inference FLOPs and 10% of KV cache compared to DeepSeek-V3.2 at 1M context through a hybrid attention architecture combining Compressed Sparse Attention and Heavily Compressed Attention.
Tencent Releases Hy3-Preview: 295B-Parameter MoE Model with 21B Active Parameters
Tencent has released Hy3-preview, a 295-billion-parameter Mixture-of-Experts model with 21 billion active parameters and a 256K context window. The model scores 76.28% on MATH and 34.86% on LiveCodeBench-v6, with particularly strong performance on coding agent tasks.
Open-weight models closing gap with frontier AI, but struggle looms in specialized domains
Open-weight AI models are narrowing the performance gap with closed frontier models in current benchmarks focused on coding and terminal tasks, but industry analysts predict they'll struggle to keep pace as the field shifts toward specialized knowledge work in accounting, law, and healthcare. The gap reduction masks a more complex dynamic where benchmark correlation with real-world performance is weakening.
NVIDIA Releases GR00T N1.7, 3B-Parameter Open-Source Humanoid Robot Model Trained on 20,854 Hours of Human Video
NVIDIA released GR00T N1.7, a 3-billion parameter open-source Vision-Language-Action model for humanoid robots with commercial licensing. The model was trained on 20,854 hours of human egocentric video data and demonstrates the first documented scaling law for robot dexterity, where increasing human video data from 1,000 to 20,000 hours more than doubles task completion rates.
Tencent Releases HY-World 2.0: Open-Source Multi-Modal Model Generates 3D Worlds from Text and Images
Tencent has released HY-World 2.0, an open-source multi-modal world model that generates navigable 3D environments from text prompts, single images, multi-view images, or video. The model produces editable 3D assets including meshes and 3D Gaussian Splattings that can be directly imported into game engines like Unity and Unreal Engine.
Baidu releases ERNIE-Image, an 8B parameter text-to-image model with strong text rendering capabilities
Baidu has released ERNIE-Image, an 8B parameter text-to-image generation model built on a single-stream Diffusion Transformer architecture. The model is designed for complex instruction following, text rendering, and structured image generation, and can run on consumer GPUs with 24GB VRAM.
UK AI Safety Institute confirms Claude Mythos finds more exploits as token spend increases
The UK's AI Safety Institute published an independent evaluation confirming Anthropic's Claude Mythos is highly effective at finding security vulnerabilities. The evaluation revealed a linear relationship: more tokens spent equals more exploits discovered, transforming security into an economic arms race.
MiniMax releases M2.7, a 229B parameter model with self-evolving capabilities and agent teams
MiniMax has released MiniMax-M2.7, a 229-billion parameter model that uniquely participates in its own evolution during development. The model achieves 66.6% medal rate on MLE Bench Lite and 56.22% on SWE-Pro benchmarks, with native support for multi-agent collaboration and complex tool orchestration.
Google releases Gemma 4, open-source on-device AI with agentic tool use for phones
Google released Gemma 4, an open-source multimodal model that runs entirely on smartphones without sending data to the cloud. The E2B and E4B variants require just 6GB and 8GB of RAM respectively and can autonomously use tools like Wikipedia, maps, and QR code generators through built-in agent skills. The model is available free via the Google AI Edge Gallery app for Android and iOS.