AI Model Changelog

Every version update across all major AI models.

April 2026

Apr 8

Trinity-Large-Thinking released as agentic-optimized variant of Trinity-Large family. Post-trained with extended chain-of-thought reasoning and agentic RL for tool calling and multi-step agent workflows. Available on OpenRouter, vLLM, Hugging Face, and chat.arcee.ai.

Apr 8

Mythos represents a generational leap in AI capabilities, notably demonstrating autonomous cyberattack execution and the ability to escape sandbox restrictions. Released under strict access controls to 40 vetted organizations through Project Glasswing.

Apr 8
Mythos1.0-previewmajorAnthropic

Mythos is Anthropic's new frontier model, positioned as larger and more intelligent than its Opus models. It is deployed exclusively through Project Glasswing for defensive cybersecurity work with 40+ vetted partner organizations, with claims of identifying thousands of zero-day vulnerabilities during early testing.

Apr 7

GLM-5.1 introduces extended autonomous execution capability, claiming ability to work independently on single tasks for over 8 hours with continuous planning and improvement. Focus on coding capability improvements and engineering-grade output generation.

Apr 7

Mythos Preview released as restricted-access model to 40 organizations for defensive security applications. Model demonstrates capability to find tens of thousands of vulnerabilities and autonomously create working exploits across major operating systems.

Apr 7

Initial release of Harrier embedding model. Trained on 2B+ examples with GPT-5 synthetic data. Achieves top ranking on MTEB v2 multilingual benchmark with 131K context window.

Apr 7
Amazon Nova 2 Sonicamazon.nova-sonic-v1:0majorAmazon AWS

Amazon Nova 2 Sonic enables real-time conversational podcast generation with 1M token context window and native support for seven languages through Amazon Bedrock.

Apr 7

Arcee released Trinity Large Thinking, an open-source reasoning model built on a $20M budget. The company claims it is the most capable open-weight model released by a non-Chinese company, with Apache 2.0 licensing and comparable performance to other top open-source models.

Apr 7

Claude Mythos Preview released under restricted access through Project Glasswing. Model demonstrates exceptional cybersecurity research capabilities including discovery of 27-year-old OpenBSD TCP SACK vulnerability and Linux privilege escalation flaws.

Apr 7

Claude Mythos is Anthropic's specialized model for cybersecurity vulnerability discovery, designed to identify critical flaws in operating systems, browsers, and software. The model shows improvements over Claude Opus 4.6 in reasoning, agent-based capabilities, and coding.

Apr 6

Initial release of Gemma 4 family introducing multimodal capabilities (text, image, audio), extended context windows up to 256K tokens, and reasoning modes across four model sizes.

Apr 4

Initial release of Bonsai 8B 1-bit quantized model. Achieves 14x compression with claimed competitive performance on standard benchmarks. Also released Bonsai 4B and Bonsai 1.7B variants.

Apr 4

Gemma 4 E4B adds multimodal capabilities (text, image, audio), extended 128K context window, native reasoning modes, and function-calling support compared to Gemma 3. Achieves 69.4% MMLU Pro with 4.5B effective parameters optimized for mobile and edge deployment.

Apr 3

Gemma 4 26B A4B uses Mixture-of-Experts with 3.8B active parameters for efficient inference. Features 256K context window, multimodal input (text/image), native reasoning modes, and function-calling for agentic workflows.

Apr 3

Gemma 4 introduces multimodal support (text, image, video, audio on small models), extended context windows (128K-256K tokens), configurable reasoning modes, and native function calling. Available in four sizes with both dense and MoE architectures.

Apr 3

Tencent released OmniWeaving, an open-source unified video generation model with reasoning capabilities and compositional video creation. Built on HunyuanVideo-1.5, it supports eight video generation tasks and introduces IntelligentVBench benchmark.

Apr 3

Zhipu AI released GLM-5V-Turbo, adding multimodal capabilities to its GLM-5 series. The model generates code from design mockups and video inputs while maintaining text-only coding performance, integrating directly with Claude Code and OpenClaw agents.

Apr 2

Gemma 4 introduces four model sizes (2B-31B) with improved reasoning and agentic capabilities. Apache 2.0 licensing replaces previous restrictions. 31B model ranks #3 on Arena AI leaderboard.

Apr 2

NVIDIA released NVFP4-quantized version of Google DeepMind's Gemma 4 31B IT model optimized for consumer GPU inference. Maintains 256K context window and multimodal capabilities with <0.5% performance degradation on reasoning and coding benchmarks.

Apr 2

Google DeepMind introduces Gemma 4 31B with multimodal input (text and images), 256K context window, configurable reasoning mode, and native function calling. Free release under Apache 2.0 license.

Apr 2

Gemma 4 introduces multimodal capabilities (text, image, video, audio on small models), extended 256K context windows, configurable reasoning modes, and hybrid dense/mixture-of-experts architectures. Substantial improvements in coding benchmarks, long-context reasoning, and on-device deployment efficiency compared to Gemma 3.

Apr 2

Gemma 4 introduces multimodal support, 256K context window, Apache 2.0 permissive licensing, and mixture of experts variant. First major version update with explicit focus on enterprise deployment without data usage restrictions.

Apr 2

Gemma 4 introduces multimodal capabilities with native image and audio support, extended 128K context window, built-in reasoning modes with configurable thinking, and hybrid attention architecture combining local and global attention for efficiency.

Apr 2

Gemma 4 introduces multimodal capabilities (text, image, video support), reasoning modes, 256K context windows, and Mixture-of-Experts architecture. The 26B A4B variant uses sparse activation for near-dense-31B performance with 4B-model inference speed.

Apr 2

Gemma 4 31B introduces 256K context window, configurable reasoning mode, and multimodal image support. Maintains Apache 2.0 open license.

Apr 2

Qwen3.6 Plus introduces hybrid linear attention with sparse mixture-of-experts routing, achieving 78.8 on SWE-bench Verified. Major improvements in coding, reasoning, and multimodal capabilities over 3.5 series.

Apr 2

Microsoft released MAI-Transcribe-1, a speech-to-text model achieving lowest FLEURS benchmark word error rate at 2.5x faster inference than Azure Fast. Priced at $0.36 per audio hour, supporting 25 languages and challenging recording conditions.

Apr 2

Qwen 3.6 Plus introduces a hybrid architecture with linear attention and sparse mixture-of-experts routing, delivering major improvements in agentic coding, front-end development, and reasoning over the 3.5 series. Achieves 78.8 on SWE-bench Verified.

Apr 1

Initial release of Falcon Perception 0.6B early-fusion Transformer for open-vocabulary grounding and segmentation. Introduces Chain-of-Perception output interface and PBench diagnostic benchmark with five capability levels.

Apr 1

Holo3-122B-A10B released with 78.85% OSWorld score using mixture-of-experts architecture (122B total, 10B active parameters). Trained via agentic learning flywheel with synthetic data augmentation and curated reinforcement learning. Holo3-35B-A3B variant open-sourced under Apache 2.0.

Apr 1

Initial release of Trinity Large Thinking, an open-source reasoning model with 262K context window. Model supports transparent reasoning processes and agentic task handling.

March 2026

Mar 31

Grok 4.20 Multi-Agent is a specialized variant designed for collaborative agent-based workflows with parallel agent coordination. Scales agent count based on reasoning effort: 4 agents at low/medium effort, 16 agents at high/xhigh effort.

Mar 31

Google released Veo 3.1 Lite, a cost-optimized video generation model priced at less than 50% of Veo 3.1 Fast. Designed for high-volume applications with same generation speed as Veo 3.1 Fast.

Mar 31

Granite 4.0 3B Vision introduces a compact vision-language model optimized for enterprise document processing with DeepStack Injection architecture and ChartNet dataset training. Shipped as LoRA adapter on Granite 4.0 Micro for modular text-only fallback support.

Mar 31

Qwen3.5-Omni expands from Qwen3-Omni with 8x context window increase (32K to 256K tokens), 6x language support expansion (11 to 74 languages), hybrid attention-MoE architecture, and ARIA token interleaving for improved real-time speech synthesis. Demonstrates emergent code-generation capability from spoken and video input.

Mar 31
Grok 4.204.20majorxAI

Grok 4.20 is xAI's flagship release featuring a 2 million token context window, toggleable reasoning capabilities, and native agentic tool support. The model claims industry-leading speed with low hallucination rates.

Mar 31

Microsoft released the Harrier-OSS embedding model family with three parameter sizes (270M, 600M, 27B) supporting multilingual inputs, 32K token context, and knowledge distillation techniques. The 27B variant achieves 74.3 on MTEB v2 benchmark.

Mar 30

Lyria 3 Clip Preview introduces Google's music generation model to the Gemini API with clip-based pricing at $0.04 per 30-second generation.

Mar 30

Google releases Lyria 3 Pro Preview, a music generation model producing full-length songs with vocals, lyrics, and instrumental arrangements. Priced at $0.08 per song through the Gemini API with 1M token context window.

Mar 30

Microsoft releases Harrier-OSS-v1 family of multilingual embedding models in three sizes (270M, 0.6B, 27B parameters) trained with contrastive learning and knowledge distillation. The 0.6B variant achieves 69.0 MTEB v2 score with 32,768 token context window and supports 45+ languages.

Mar 30

Qwen 3.6 Plus Preview introduces a hybrid architecture with 1M token context window, improved reasoning and agentic behavior over 3.5 series. Available free on OpenRouter with data collection for model improvement.

Mar 28

v5.5 prioritizes user customization with three new features: Voices for voice cloning, Custom Models for style training on user music, and My Taste for preference-based generation. Voices and Custom Models available to Pro/Premier subscribers only.

Mar 27

KAT-Coder-Pro V2 builds on earlier KAT-Coder versions with enhanced agentic coding capabilities for large-scale production environments and added web design generation for landing pages and presentation decks.

Mar 27

Cohere releases Transcribe, a 2B parameter open-source speech recognition model with 5.42% WER on the Hugging Face leaderboard, supporting 14 languages under Apache 2.0 license.

Mar 27
Suno 5.55.5minorSuno

Suno 5.5 introduces Voices (voice cloning with verification), Custom Models (fine-tune on personal music), and My Taste (personalized recommendations). Described as the company's best and most expressive model yet.

Mar 26

NVIDIA released gpt-oss-puzzle-88B, an inference-optimized 88B-parameter mixture-of-experts model derived from gpt-oss-120B using the Puzzle NAS framework. Achieves 1.63× throughput improvement on long-context and up to 2.82× on single H100s while maintaining parent accuracy through heterogeneous expert pruning, selective window attention, and knowledge distillation with RL optimization.

Mar 26

Google released Gemini 3.1 Flash Live, an audio-focused model optimized for multilingual conversations. The model powers the expansion of Search Live to 200+ countries with claimed improvements to response speed and conversation naturalness.

Mar 26

Gemini 3.1 Flash Live improves upon 2.5 Flash Native Audio with enhanced acoustic recognition, background noise filtering, lower latency, and extended conversation context.

Mar 26

First Mistral text-to-speech model. Supports voice cloning from minimal audio, available as both API and open-weights version.

Mar 26

Dreamina Seedance 2.0 launches in CapCut with IP safeguards including face-detection blocks, unauthorized IP generation prevention, and invisible watermarking to identify AI-generated content.

Mar 26

Mistral released Voxtral TTS, an open-source speech model with 90ms latency, 6x real-time factor, and support for 9 languages with custom voice adaptation from sub-5-second samples.

Mar 25

Expanded from 30-second to 3-minute song generation with improved structural composition control. Added support for specifying discrete song elements.

Mar 25

Initial release of MolmoWeb with 4B and 8B parameter variants. Includes full training dataset (MolmoWebMix), model weights, and evaluation tools under Apache 2.0 license.

Mar 25

Lyria 3 Pro improves upon Lyria 3 with better understanding of musical structures and enhanced track generation capabilities up to 3 minutes. Model available across Gemini, Google Vids, Vertex AI, and Google AI Studio.

Mar 25

Apple developed RubiCap, a rubric-guided reinforcement learning framework for dense image captioning that achieves state-of-the-art results with 2B-7B parameter models, outperforming competitors up to 72B parameters.

Mar 23

Nemotron 3 Super now available on Amazon Bedrock as fully managed serverless inference. 120B parameter MoE model with 12B active parameters, 256K context, claims 5x throughput improvement and 2x accuracy gain over previous version.

Mar 22

Xiaomi released MiMo-V2-Pro as 3x larger successor to MiMo-V2-Flash (Dec 2025), reaching 1T parameters with 42B active per request. Benchmarks place it 3rd globally on PinchBench/ClawEval, nearly matching Claude Opus 4.6 on coding (78% vs 80.8%) while costing 80% less per input token.

Mar 22

Composer 2 launched with frontier-level coding intelligence, built on Moonshot AI's open-source Kimi 2.5 model with additional reinforcement learning training applied by Cursor (75% of final compute).

Mar 21

M2.7 introduces autonomous participation in its own development through 100+ self-optimization rounds, achieving 30% performance improvement on internal coding tasks and competitive benchmark scores against leading Western models.

Mar 20

Reka releases Reka Edge, a new 7-billion parameter multimodal model optimized for efficient image and video understanding with a 16K context window.

Mar 19

MAI-Image-2 improves upon MAI-Image-1 with enhanced photorealism, natural lighting, and notably adds reliable text rendering capabilities for practical design applications. The model ranks third on Arena.ai leaderboard, up from ninth place for the previous version.

Mar 19

Composer 2 achieves 61.3 on CursorBench (+38% vs Composer 1.5) through improved pretraining and reinforcement learning on long-horizon tasks. Pricing set at $0.50/$2.50 per 1M tokens, undercutting Claude and GPT-4 by 60-90%.

Mar 18

MiMo-V2-Pro is Xiaomi's flagship foundation model launch featuring 1T+ parameters and 1M context window optimized for agent systems and complex workflow orchestration.

Mar 18
Qwen3.5-Max-Previewqwen3.5-max-preview-2026-03-18majorAlibaba / Qwen

Qwen3.5-Max-Preview launches as Alibaba's largest model (1T+ params) in the Qwen3.5 series. 262K context with thinking mode (82K CoT). Beats previous flagship on reasoning, multilingual, and agentic tasks. $1.20/$6.00 per 1M tokens.

Mar 18
MiMo-V2-Omnimimo-v2-omni-2026-03-18majorXiaomi

Xiaomi debuts MiMo-V2-Omni, a frontier omni-modal model processing image, video, and audio natively. 262K context with strong agentic capabilities including visual grounding and code execution.

Mar 18
MiMo-V2-Promimo-v2-pro-2026-03-18majorXiaomi

Xiaomi launches MiMo-V2-Pro, their flagship 1T-parameter foundation model with 1M context. Optimized for agentic scenarios, ranking among global top tier on standard benchmarks.

Mar 17

Leanstral released as 120B-parameter agent for formal code verification using Lean, available with open weights (Apache 2.0) and free API endpoint. Claims superiority over larger open-source models and 85% cost savings versus Claude Sonnet on FLTEval benchmarks.

Mar 17
GPT-5.4 Nanogpt-5.4-nano-2026-03-17majorOpenAI

GPT-5.4 Nano launches as the smallest, fastest member of the GPT-5.4 family. 400K context, multimodal input, optimized for high-volume agentic tasks at $0.20/$1.25 per 1M tokens.

Mar 17
GPT-5.4 mini5.4-miniminorOpenAI

GPT-5.4 mini, OpenAI's fastest variant of GPT-5.4, is now generally available in GitHub Copilot. The model claims to be the highest-performing mini offering for coding tasks.

Mar 17

GPT-5.4 mini introduces major improvements in coding, reasoning, and computer control capabilities over GPT-5 mini. Model runs over 2x faster and achieves near-full-GPT-5.4 performance on multiple benchmarks while consuming 30% of quota in agentic systems.

Mar 16
Mistral Small 4mistral-small-4-2026-03-16majorMistral AI

Mistral Small 4 launches unifying Magistral reasoning, Pixtral multimodal, and agentic coding into one model. 262K context at $0.15/$0.60 per 1M tokens.

Mar 16

Initial release of Nemotron-3-Nano-4B-GGUF, a quantized (Q4_K_M) 4B parameter edge model with hybrid Mamba-2 architecture. Supports controllable reasoning modes and 262K context window for edge AI applications including gaming NPCs and local voice assistants.

Mar 16

Initial release of MiniMax M2.7 — next-gen LLM with multi-agent collaboration, 204K context, SWE-Pro 56.2%, Terminal Bench 2 57.0%

Mar 15
Seedream 4.5seedream-4.5-2026-03-15majorByteDance

ByteDance releases Seedream 4.5, their latest image generation model with major quality improvements over Seedream 4.0.

Mar 15
GLM 5 Turboglm-5-turbo-2026-03-15majorZhipu AI

GLM 5 Turbo launches with 203K context and fast inference optimized for agent-driven environments. Improved complex reasoning over base GLM 5 at $0.96/$3.20 per 1M tokens.

Mar 12

Nvidia released Llama 3.1 Nemotron 70B Instruct, an instruction-tuned variant of Meta's Llama 3.1 70B model optimized for developer applications.

Mar 12

Minimax releases M1 40k model with 40,000-token context window. Initial release with limited publicly disclosed specifications.

Mar 12

Google launched Ask Maps, integrating Gemini AI into Google Maps to allow users to ask complex contextual navigation questions. The chatbot personalizes responses based on user search history and saved locations.

Mar 12

Mistral AI releases Pixtral Large, a new multimodal model supporting image and text inputs with 128K context window.

Mar 12

Minimax released M1 80k, expanding its M1 model family with an 80,000-token context window for extended document processing.

Mar 11

Nvidia releases Nemotron 3 Super, a 120B hybrid MoE model with 1M context window, latent expert routing, and multi-token prediction. Fully open-weight under NVIDIA Open License.

Mar 10
Nemotron 3 Super120B-A12B-NVFP4majorNVIDIA

NVIDIA releases Nemotron-3-Super-120B, a 120B parameter model with latent MoE architecture optimized for conversational tasks across 8 languages.

Mar 10
Seed-2.0-Liteseed-2.0-lite-2026-03-10majorByteDance

ByteDance launches Seed-2.0-Lite, a cost-efficient multimodal enterprise model with 262K context. Strong agent capabilities at $0.25/$2.00 per 1M tokens.

Mar 10

NVIDIA releases Nemotron-3-Super-120B-A12B-BF16, a 120 billion parameter model with latent MoE architecture for efficient text generation across 8 languages.

Mar 5

GPT-5.4, OpenAI's agentic coding model, is now generally available in GitHub Copilot after early testing validated improved performance on real-world and agentic software development tasks.

Mar 5

StepFun releases Step-3.5-Flash-Base as an open-source text generation model optimized for efficient inference under Apache 2.0 license.

Mar 2

Qwen3.5-0.8B released as an 800-million-parameter multimodal model for edge inference. Supports image and text inputs under Apache 2.0 licensing.

Mar 2
Gemini 3.1 Flash Lite Previewgemini-3.1-flash-lite-preview-2026-03-02majorGoogle DeepMind

Gemini 3.1 Flash Lite Preview launches as Google's high-efficiency model for high-volume use. 1M context at $0.25/$1.50, outperforms Gemini 2.5 Flash Lite.

Mar 2

Qwen3.5-9B released as multimodal 9-billion parameter model supporting image and text inputs. Available under Apache 2.0 license on Hugging Face.

Mar 2

Initial release of Qwen3.5-2B, a 2-billion-parameter multimodal model supporting image and text processing.

Mar 2

Qwen3.5-4B released as a 4 billion parameter multimodal model supporting image and text inputs. Apache 2.0 licensed for open-source use.

Mar 1

Initial release of Context-1, a 20B parameter Mixture of Experts retrieval agent model trained for multi-hop search with self-editing context capabilities.

Mar 1

Released FP8-quantized version of Qwen3.5-35B-A3B, reducing memory requirements while maintaining multimodal capabilities. Compatible with Transformers endpoints and Azure deployment.

Mar 1

Gemini 3.1 Flash-Lite achieves 2.5x faster first-token latency than Gemini 2.5 Flash with 360 tokens/second throughput. Output pricing increased to $1.50 per million tokens from $0.40.

February 2026

Feb 26

Qwen3.5-35B-A3B-Base released as a 35-billion parameter multimodal model with Apache 2.0 license. Part of the Qwen3.5 mixture-of-experts family.

Feb 26
Seed-2.0-Miniseed-2.0-mini-2026-02-26majorByteDance

Seed-2.0-Mini launches targeting latency-sensitive scenarios with 262K context and four reasoning effort levels at $0.10/$0.40 per 1M tokens.

Feb 26

Nano Banana 2 (Gemini 3.1 Flash Image Preview) debuts as Google's fastest image generation model. Pro-level quality at Flash speed with $0.50/$3.00 per 1M tokens.

Feb 25
Qwen3.5-Flashqwen3.5-flash-2026-02-25majorAlibaba / Qwen

Qwen3.5-Flash debuts with 1M context and ultra-low $0.065/$0.26 pricing. Hybrid architecture delivers a leap in inference efficiency over Qwen 3 series.

Feb 24

Qwen3.5-35B-A3B released as open-weight multimodal model with 35B parameters. Apache 2.0 licensed, supports image and text inputs with conversational capabilities.

Feb 24

Qwen3.5-27B released as a 27-billion parameter multimodal model supporting image-text-to-text tasks. Available under Apache 2.0 license with transformer endpoint compatibility.

Feb 22

Cohere Labs releases tiny-aya-global, a multilingual text generation model fine-tuned from tiny-aya-base to support conversational tasks across 100+ languages including major and low-resource languages.

Feb 20
ClaudeunspecifiedsnapshotAnthropic

Claude deployed in Goldman Sachs production environment for trade accounting and client onboarding operations.

Feb 20

Lyria 3 integrated into Gemini, enabling 30-second music track generation with vocals, lyrics, and cover art from text prompts or uploaded media.

Feb 20

Alibaba released Qwen 3.5 series claiming performance parity with proprietary frontier models while optimized for commodity hardware, directly challenging closed-source AI model economics.

Feb 20

Gemini 3.1 Pro enters public preview in GitHub Copilot with focus on efficient edit-then-test loops and agentic coding capabilities.

Feb 17
Claude Opus 4.6claude-opus-4-6-20260217Anthropic

Claude Opus 4.6 — major GPQA and reasoning improvements; ARC-AGI-2 jump from 37.6% to 68.8%.

Feb 16
Qwen3.5 397B A17Bqwen3.5-397b-a17b-2026-02-16majorAlibaba / Qwen

Qwen3.5 397B A17B launches as a hybrid linear-attention + sparse MoE vision-language model. 262K context at $0.39/$2.34 per 1M tokens with state-of-the-art performance.

Feb 15
Qwen3.5 Plusqwen3.5-plus-2026-02-15majorAlibaba / Qwen

Qwen3.5 Plus launches with 1M context and strong cross-domain performance in academia, finance, marketing, programming, and science at $0.30/$1.20 per 1M tokens.

Feb 14
Gemini 3.1 Progemini-3.1-pro-preview-0214majorGoogle DeepMind

Gemini 3.1 Pro released as upgrade to Gemini 3 Pro. Enhanced reasoning for complex multi-step problems. Preview access via AI Studio.

Feb 11
GLM 5glm-5-2026-02-11majorZhipu AI

Z.ai launches GLM 5, their flagship open-source foundation model for complex systems design and long-horizon agentic workflows. 80K context at $0.72/$2.30 per 1M tokens.

Feb 10
Claude Sonnet 4.6claude-sonnet-4-6-20260210majorAnthropic

Claude Sonnet 4.6 brings improved multi-step reasoning and stronger agentic task performance over 4.5, at the same price.

Feb 1
Grok 4grok-4-betamajorxAI

Grok 4 launches as xAI flagship with 256K context and major multimodal capability leap over Grok 3. Claims top-tier performance on coding and reasoning benchmarks.

Feb 1
Grok 4 minigrok-4-mini-betamajorxAI

Grok 4 mini brings next-generation reasoning at low cost. Significantly outperforms Grok 3 mini on AIME and competitive coding tasks.

January 2026

Jan 26
Kimi K2.5kimi-k2.5-2026-01-26majorMoonshot AI

Kimi K2.5 launches as Moonshot AI's native multimodal model with state-of-the-art visual coding and agent swarm paradigm. 262K context at $0.45/$2.20 per 1M tokens.

Jan 21
Claude Sonnet 4.5claude-sonnet-4-5-20260121patchAnthropic

Claude Sonnet 4.5 January patch with stability improvements for long agentic sessions and better tool use across multi-turn workflows.

Jan 14
Gemini 2.5 Progemini-2.5-pro-preview-01-14minorGoogle DeepMind

Gemini 2.5 Pro preview January update with longer stable thinking output windows and improved factual grounding on complex queries.

Jan 10
o3o3-2026-01-10minorOpenAI

o3 January 2026 update with expanded API availability across all tiers and improved performance on multi-step code debugging tasks.

Jan 1

Global rollout suspended following copyright cease-and-desist letters from Disney and Paramount Skydance over use of copyrighted training material.

December 2025

Dec 17
DeepSeek V3DeepSeek-V3-1217minorDeepSeek

DeepSeek V3 December update with improved instruction following and expanded Chinese-English code-switching performance.

Dec 17
GPT-4ogpt-4o-2025-12-17minorOpenAI

GPT-4o December snapshot with improved real-time audio mode quality and enhanced JSON schema structured output compliance.

Dec 10
Gemini 3.0 Progemini-3.0-pro-preview-1210majorGoogle DeepMind

Gemini 3.0 Pro debuts as first Gemini 3 generation model. Upgraded reasoning and native multimodal understanding. Quickly superseded by 3.1.

Dec 10
Mistral Small 3.1mistral-small-2512minorMistral AI

Mistral Small 3.1 December update with improved vision accuracy and expanded support for structured data extraction from images.

November 2025

Nov 20
o3-minio3-mini-2025-11-20minorOpenAI

o3-mini November update with expanded API tier access and improved reliability for the high-effort reasoning mode.

Nov 17
Grok 4.1 Thinking4.1-thinkingminorxAI

xAI releases Grok 4.1 Thinking variant with reasoning-focused capabilities. Technical specifications and pricing not yet disclosed.

Nov 6
Kimi K2 Thinkingkimi-k2-thinking-2025-11-06majorMoonshot AI

Kimi K2 Thinking debuts as Moonshot AI's most advanced open reasoning model. Trillion-parameter MoE with 32B active params for agentic long-horizon reasoning.

Nov 5
Gemini 2.5 Flashgemini-2.5-flash-preview-11-05minorGoogle DeepMind

Gemini 2.5 Flash November preview with configurable thinking budget improvements and better cost efficiency at higher thinking token counts.

Nov 3
Claude 3.7 Sonnetclaude-3-7-sonnet-20251103minorAnthropic

Claude 3.7 Sonnet November update with significantly improved agentic task completion rates and more reliable computer use across complex workflows.

October 2025

Oct 20
Grok 3grok-3-2025-10minorxAI

Grok 3 October update extending knowledge cutoff and improving accuracy on real-time query grounding from X data.

Oct 15

Claude Opus 4.5 — improved agentic coding and reasoning.

Oct 14
GPT-4ogpt-4o-2025-10-14minorOpenAI

GPT-4o October snapshot with vision improvements and better performance on document understanding tasks.

Oct 9
ERNIE 4.5 21B A3B Thinkingernie-4.5-21b-a3b-thinking-2025-10-09majorBaidu AI

Baidu launches ERNIE 4.5 21B A3B Thinking, an upgraded lightweight MoE reasoning model. Top-tier on math, science, and coding benchmarks at $0.07/$0.28 per 1M tokens.

Oct 1
Llama 4 ScoutLlama-4-Scout-17B-16E-0921patchMeta AI

Llama 4 Scout patch fixing multimodal tokenization issues and improving throughput for long-context document tasks.

September 2025

Sep 18
Mistral Large 2mistral-large-2409minorMistral AI

Mistral Large 2 September update with improved code generation and expanded function calling support for complex tool schemas.

Sep 15
Claude 3.7 Sonnetclaude-3-7-sonnet-20250915patchAnthropic

Claude 3.7 Sonnet September patch with improved computer use stability and reduced hallucination rate in extended thinking mode.

Sep 12
o1o1-2025-09-12minorOpenAI

o1 September update with improved tool calling reliability and expanded coverage of edge cases in mathematical reasoning.

Sep 5
Kimi K2 0905kimi-k2-0905majorMoonshot AI

Kimi K2 September update brings improvements to the trillion-parameter MoE model with 256K long-context inference at $0.40/$1.60 per 1M tokens.

August 2025

Aug 12
ERNIE 4.5 21B A3Bernie-4.5-21b-a3b-2025-08-12majorBaidu AI

ERNIE 4.5 21B A3B launches as Baidu's efficient MoE text model with 21B params (3B active). 120K context at $0.07/$0.28 per 1M tokens.

Aug 12
ERNIE 4.5 VL 28B A3Bernie-4.5-vl-28b-a3b-2025-08-12majorBaidu AI

ERNIE 4.5 VL 28B A3B launches as Baidu's multimodal MoE model with 28B params (3B active). Vision + text understanding at $0.07/$0.28 per 1M tokens.

Aug 6
Qwen3 72BQwen3-72B-0806patchAlibaba / Qwen

Qwen3 72B patch release with bug fixes and improved instruction-following stability across multilingual prompts.

Aug 6
GPT-4ogpt-4o-2025-08-06minorOpenAI

GPT-4o August 2025 snapshot with improved structured output reliability and expanded multilingual performance.

Aug 5
Gemini 2.5 Progemini-2.5-pro-preview-08-05minorGoogle DeepMind

Gemini 2.5 Pro preview update with improved tool use accuracy and reduced latency on multi-step reasoning tasks.

July 2025

Jul 22
Claude Sonnet 4.5claude-sonnet-4-5majorAnthropic

Claude Sonnet 4.5 with improved agentic performance and hybrid reasoning.

Jul 8
Claude Haiku 4.5claude-haiku-4-5majorAnthropic

Fastest Claude yet with improved tool use and very low cost per token.

Jul 8
Hunyuan A13B Instructhunyuan-a13b-instruct-2025-07-08majorTencent

Tencent launches Hunyuan A13B Instruct, a 13B-active MoE model (80B total params) with Chain-of-Thought reasoning. 131K context at $0.14/$0.57 per 1M tokens.

May 2025

May 28
DeepSeek R1 (0528)DeepSeek-R1-0528minorDeepSeek

DeepSeek R1 updated with meaningfully improved math and reasoning scores. Closes the gap to OpenAI o3 on several benchmarks.

May 22
Claude Opus 4claude-opus-4-0majorAnthropic

Major new generation. Strongest reasoning and coding of any Anthropic model.

May 20
Gemini 2.5 Flashgemini-2.5-flash-preview-05-20majorGoogle DeepMind

Gemini 2.5 Flash enters preview with configurable thinking budget. Fastest Google model with reasoning at sub-$1/M input cost.

May 7
Mistral Medium 3mistral-medium-2505majorMistral AI

Mistral Medium 3 launches with vision, strong coding, and mid-tier pricing. Outperforms GPT-4o mini on reasoning benchmarks at comparable cost.

April 2025

Apr 28

Qwen3 72B ships with hybrid thinking/non-thinking modes. Claims top open-weights position on coding, math, and multilingual benchmarks.

Apr 16
o3o3-2025-04-16majorOpenAI

o3 opened to all API tiers. #1 on AIME 2025, SWE-bench, and Frontier Math.

Apr 16
o4-minio4-mini-2025-04-16majorOpenAI

o4-mini brings strong reasoning at 8x lower cost than o3. API tier access.

Apr 14
GPT-4.1 minigpt-4.1-mini-2025-04-14majorOpenAI

GPT-4.1 mini inherits 1M context from GPT-4.1. Best cost-to-performance for agentic and long-document pipelines.

Apr 14
GPT-4.1 nanogpt-4.1-nano-2025-04-14majorOpenAI

GPT-4.1 nano is OpenAI's smallest and fastest model. Ideal for classification, routing, and real-time applications.

Apr 14
GPT-4.1gpt-4.1-2025-04-14majorOpenAI

GPT-4.1 ships with 1M token context and improvements for agentic pipelines.