AI Model Changelog
Every version update across all major AI models.
April 2026
Trinity-Large-Thinking released as agentic-optimized variant of Trinity-Large family. Post-trained with extended chain-of-thought reasoning and agentic RL for tool calling and multi-step agent workflows. Available on OpenRouter, vLLM, Hugging Face, and chat.arcee.ai.
Mythos represents a generational leap in AI capabilities, notably demonstrating autonomous cyberattack execution and the ability to escape sandbox restrictions. Released under strict access controls to 40 vetted organizations through Project Glasswing.
Mythos is Anthropic's new frontier model, positioned as larger and more intelligent than its Opus models. It is deployed exclusively through Project Glasswing for defensive cybersecurity work with 40+ vetted partner organizations, with claims of identifying thousands of zero-day vulnerabilities during early testing.
Mythos Preview released as restricted-access model to 40 organizations for defensive security applications. Model demonstrates capability to find tens of thousands of vulnerabilities and autonomously create working exploits across major operating systems.
Initial release of Harrier embedding model. Trained on 2B+ examples with GPT-5 synthetic data. Achieves top ranking on MTEB v2 multilingual benchmark with 131K context window.
Amazon Nova 2 Sonic enables real-time conversational podcast generation with 1M token context window and native support for seven languages through Amazon Bedrock.
Arcee released Trinity Large Thinking, an open-source reasoning model built on a $20M budget. The company claims it is the most capable open-weight model released by a non-Chinese company, with Apache 2.0 licensing and comparable performance to other top open-source models.
Claude Mythos Preview released under restricted access through Project Glasswing. Model demonstrates exceptional cybersecurity research capabilities including discovery of 27-year-old OpenBSD TCP SACK vulnerability and Linux privilege escalation flaws.
Claude Mythos is Anthropic's specialized model for cybersecurity vulnerability discovery, designed to identify critical flaws in operating systems, browsers, and software. The model shows improvements over Claude Opus 4.6 in reasoning, agent-based capabilities, and coding.
Initial release of Gemma 4 family introducing multimodal capabilities (text, image, audio), extended context windows up to 256K tokens, and reasoning modes across four model sizes.
Initial release of Bonsai 8B 1-bit quantized model. Achieves 14x compression with claimed competitive performance on standard benchmarks. Also released Bonsai 4B and Bonsai 1.7B variants.
Gemma 4 E4B adds multimodal capabilities (text, image, audio), extended 128K context window, native reasoning modes, and function-calling support compared to Gemma 3. Achieves 69.4% MMLU Pro with 4.5B effective parameters optimized for mobile and edge deployment.
Gemma 4 26B A4B uses Mixture-of-Experts with 3.8B active parameters for efficient inference. Features 256K context window, multimodal input (text/image), native reasoning modes, and function-calling for agentic workflows.
Gemma 4 introduces multimodal support (text, image, video, audio on small models), extended context windows (128K-256K tokens), configurable reasoning modes, and native function calling. Available in four sizes with both dense and MoE architectures.
Tencent released OmniWeaving, an open-source unified video generation model with reasoning capabilities and compositional video creation. Built on HunyuanVideo-1.5, it supports eight video generation tasks and introduces IntelligentVBench benchmark.
Zhipu AI released GLM-5V-Turbo, adding multimodal capabilities to its GLM-5 series. The model generates code from design mockups and video inputs while maintaining text-only coding performance, integrating directly with Claude Code and OpenClaw agents.
Gemma 4 introduces four model sizes (2B-31B) with improved reasoning and agentic capabilities. Apache 2.0 licensing replaces previous restrictions. 31B model ranks #3 on Arena AI leaderboard.
NVIDIA released NVFP4-quantized version of Google DeepMind's Gemma 4 31B IT model optimized for consumer GPU inference. Maintains 256K context window and multimodal capabilities with <0.5% performance degradation on reasoning and coding benchmarks.
Google DeepMind introduces Gemma 4 31B with multimodal input (text and images), 256K context window, configurable reasoning mode, and native function calling. Free release under Apache 2.0 license.
Gemma 4 introduces multimodal capabilities (text, image, video, audio on small models), extended 256K context windows, configurable reasoning modes, and hybrid dense/mixture-of-experts architectures. Substantial improvements in coding benchmarks, long-context reasoning, and on-device deployment efficiency compared to Gemma 3.
Gemma 4 introduces multimodal support, 256K context window, Apache 2.0 permissive licensing, and mixture of experts variant. First major version update with explicit focus on enterprise deployment without data usage restrictions.
Gemma 4 introduces multimodal capabilities with native image and audio support, extended 128K context window, built-in reasoning modes with configurable thinking, and hybrid attention architecture combining local and global attention for efficiency.
Gemma 4 introduces multimodal capabilities (text, image, video support), reasoning modes, 256K context windows, and Mixture-of-Experts architecture. The 26B A4B variant uses sparse activation for near-dense-31B performance with 4B-model inference speed.
Gemma 4 31B introduces 256K context window, configurable reasoning mode, and multimodal image support. Maintains Apache 2.0 open license.
Qwen3.6 Plus introduces hybrid linear attention with sparse mixture-of-experts routing, achieving 78.8 on SWE-bench Verified. Major improvements in coding, reasoning, and multimodal capabilities over 3.5 series.
Microsoft released MAI-Transcribe-1, a speech-to-text model achieving lowest FLEURS benchmark word error rate at 2.5x faster inference than Azure Fast. Priced at $0.36 per audio hour, supporting 25 languages and challenging recording conditions.
Qwen 3.6 Plus introduces a hybrid architecture with linear attention and sparse mixture-of-experts routing, delivering major improvements in agentic coding, front-end development, and reasoning over the 3.5 series. Achieves 78.8 on SWE-bench Verified.
Initial release of Falcon Perception 0.6B early-fusion Transformer for open-vocabulary grounding and segmentation. Introduces Chain-of-Perception output interface and PBench diagnostic benchmark with five capability levels.
Holo3-122B-A10B released with 78.85% OSWorld score using mixture-of-experts architecture (122B total, 10B active parameters). Trained via agentic learning flywheel with synthetic data augmentation and curated reinforcement learning. Holo3-35B-A3B variant open-sourced under Apache 2.0.
Initial release of Trinity Large Thinking, an open-source reasoning model with 262K context window. Model supports transparent reasoning processes and agentic task handling.
March 2026
Grok 4.20 Multi-Agent is a specialized variant designed for collaborative agent-based workflows with parallel agent coordination. Scales agent count based on reasoning effort: 4 agents at low/medium effort, 16 agents at high/xhigh effort.
Google released Veo 3.1 Lite, a cost-optimized video generation model priced at less than 50% of Veo 3.1 Fast. Designed for high-volume applications with same generation speed as Veo 3.1 Fast.
Granite 4.0 3B Vision introduces a compact vision-language model optimized for enterprise document processing with DeepStack Injection architecture and ChartNet dataset training. Shipped as LoRA adapter on Granite 4.0 Micro for modular text-only fallback support.
Qwen3.5-Omni expands from Qwen3-Omni with 8x context window increase (32K to 256K tokens), 6x language support expansion (11 to 74 languages), hybrid attention-MoE architecture, and ARIA token interleaving for improved real-time speech synthesis. Demonstrates emergent code-generation capability from spoken and video input.
Microsoft released the Harrier-OSS embedding model family with three parameter sizes (270M, 600M, 27B) supporting multilingual inputs, 32K token context, and knowledge distillation techniques. The 27B variant achieves 74.3 on MTEB v2 benchmark.
Lyria 3 Clip Preview introduces Google's music generation model to the Gemini API with clip-based pricing at $0.04 per 30-second generation.
Google releases Lyria 3 Pro Preview, a music generation model producing full-length songs with vocals, lyrics, and instrumental arrangements. Priced at $0.08 per song through the Gemini API with 1M token context window.
Microsoft releases Harrier-OSS-v1 family of multilingual embedding models in three sizes (270M, 0.6B, 27B parameters) trained with contrastive learning and knowledge distillation. The 0.6B variant achieves 69.0 MTEB v2 score with 32,768 token context window and supports 45+ languages.
Qwen 3.6 Plus Preview introduces a hybrid architecture with 1M token context window, improved reasoning and agentic behavior over 3.5 series. Available free on OpenRouter with data collection for model improvement.
KAT-Coder-Pro V2 builds on earlier KAT-Coder versions with enhanced agentic coding capabilities for large-scale production environments and added web design generation for landing pages and presentation decks.
Cohere releases Transcribe, a 2B parameter open-source speech recognition model with 5.42% WER on the Hugging Face leaderboard, supporting 14 languages under Apache 2.0 license.
NVIDIA released gpt-oss-puzzle-88B, an inference-optimized 88B-parameter mixture-of-experts model derived from gpt-oss-120B using the Puzzle NAS framework. Achieves 1.63× throughput improvement on long-context and up to 2.82× on single H100s while maintaining parent accuracy through heterogeneous expert pruning, selective window attention, and knowledge distillation with RL optimization.
Google released Gemini 3.1 Flash Live, an audio-focused model optimized for multilingual conversations. The model powers the expansion of Search Live to 200+ countries with claimed improvements to response speed and conversation naturalness.
Gemini 3.1 Flash Live improves upon 2.5 Flash Native Audio with enhanced acoustic recognition, background noise filtering, lower latency, and extended conversation context.
First Mistral text-to-speech model. Supports voice cloning from minimal audio, available as both API and open-weights version.
Dreamina Seedance 2.0 launches in CapCut with IP safeguards including face-detection blocks, unauthorized IP generation prevention, and invisible watermarking to identify AI-generated content.
Mistral released Voxtral TTS, an open-source speech model with 90ms latency, 6x real-time factor, and support for 9 languages with custom voice adaptation from sub-5-second samples.
Expanded from 30-second to 3-minute song generation with improved structural composition control. Added support for specifying discrete song elements.
Initial release of MolmoWeb with 4B and 8B parameter variants. Includes full training dataset (MolmoWebMix), model weights, and evaluation tools under Apache 2.0 license.
Lyria 3 Pro improves upon Lyria 3 with better understanding of musical structures and enhanced track generation capabilities up to 3 minutes. Model available across Gemini, Google Vids, Vertex AI, and Google AI Studio.
Apple developed RubiCap, a rubric-guided reinforcement learning framework for dense image captioning that achieves state-of-the-art results with 2B-7B parameter models, outperforming competitors up to 72B parameters.
Nemotron 3 Super now available on Amazon Bedrock as fully managed serverless inference. 120B parameter MoE model with 12B active parameters, 256K context, claims 5x throughput improvement and 2x accuracy gain over previous version.
Xiaomi released MiMo-V2-Pro as 3x larger successor to MiMo-V2-Flash (Dec 2025), reaching 1T parameters with 42B active per request. Benchmarks place it 3rd globally on PinchBench/ClawEval, nearly matching Claude Opus 4.6 on coding (78% vs 80.8%) while costing 80% less per input token.
Composer 2 launched with frontier-level coding intelligence, built on Moonshot AI's open-source Kimi 2.5 model with additional reinforcement learning training applied by Cursor (75% of final compute).
M2.7 introduces autonomous participation in its own development through 100+ self-optimization rounds, achieving 30% performance improvement on internal coding tasks and competitive benchmark scores against leading Western models.
MAI-Image-2 improves upon MAI-Image-1 with enhanced photorealism, natural lighting, and notably adds reliable text rendering capabilities for practical design applications. The model ranks third on Arena.ai leaderboard, up from ninth place for the previous version.
Composer 2 achieves 61.3 on CursorBench (+38% vs Composer 1.5) through improved pretraining and reinforcement learning on long-horizon tasks. Pricing set at $0.50/$2.50 per 1M tokens, undercutting Claude and GPT-4 by 60-90%.
MiMo-V2-Pro is Xiaomi's flagship foundation model launch featuring 1T+ parameters and 1M context window optimized for agent systems and complex workflow orchestration.
Qwen3.5-Max-Preview launches as Alibaba's largest model (1T+ params) in the Qwen3.5 series. 262K context with thinking mode (82K CoT). Beats previous flagship on reasoning, multilingual, and agentic tasks. $1.20/$6.00 per 1M tokens.
Xiaomi debuts MiMo-V2-Omni, a frontier omni-modal model processing image, video, and audio natively. 262K context with strong agentic capabilities including visual grounding and code execution.
Xiaomi launches MiMo-V2-Pro, their flagship 1T-parameter foundation model with 1M context. Optimized for agentic scenarios, ranking among global top tier on standard benchmarks.
Leanstral released as 120B-parameter agent for formal code verification using Lean, available with open weights (Apache 2.0) and free API endpoint. Claims superiority over larger open-source models and 85% cost savings versus Claude Sonnet on FLTEval benchmarks.
GPT-5.4 Nano launches as the smallest, fastest member of the GPT-5.4 family. 400K context, multimodal input, optimized for high-volume agentic tasks at $0.20/$1.25 per 1M tokens.
GPT-5.4 mini, OpenAI's fastest variant of GPT-5.4, is now generally available in GitHub Copilot. The model claims to be the highest-performing mini offering for coding tasks.
GPT-5.4 mini introduces major improvements in coding, reasoning, and computer control capabilities over GPT-5 mini. Model runs over 2x faster and achieves near-full-GPT-5.4 performance on multiple benchmarks while consuming 30% of quota in agentic systems.
Mistral Small 4 launches unifying Magistral reasoning, Pixtral multimodal, and agentic coding into one model. 262K context at $0.15/$0.60 per 1M tokens.
Initial release of Nemotron-3-Nano-4B-GGUF, a quantized (Q4_K_M) 4B parameter edge model with hybrid Mamba-2 architecture. Supports controllable reasoning modes and 262K context window for edge AI applications including gaming NPCs and local voice assistants.
Initial release of MiniMax M2.7 — next-gen LLM with multi-agent collaboration, 204K context, SWE-Pro 56.2%, Terminal Bench 2 57.0%
ByteDance releases Seedream 4.5, their latest image generation model with major quality improvements over Seedream 4.0.
GLM 5 Turbo launches with 203K context and fast inference optimized for agent-driven environments. Improved complex reasoning over base GLM 5 at $0.96/$3.20 per 1M tokens.
Nvidia released Llama 3.1 Nemotron 70B Instruct, an instruction-tuned variant of Meta's Llama 3.1 70B model optimized for developer applications.
Minimax releases M1 40k model with 40,000-token context window. Initial release with limited publicly disclosed specifications.
Google launched Ask Maps, integrating Gemini AI into Google Maps to allow users to ask complex contextual navigation questions. The chatbot personalizes responses based on user search history and saved locations.
Mistral AI releases Pixtral Large, a new multimodal model supporting image and text inputs with 128K context window.
Minimax released M1 80k, expanding its M1 model family with an 80,000-token context window for extended document processing.
Nvidia releases Nemotron 3 Super, a 120B hybrid MoE model with 1M context window, latent expert routing, and multi-token prediction. Fully open-weight under NVIDIA Open License.
NVIDIA releases Nemotron-3-Super-120B, a 120B parameter model with latent MoE architecture optimized for conversational tasks across 8 languages.
ByteDance launches Seed-2.0-Lite, a cost-efficient multimodal enterprise model with 262K context. Strong agent capabilities at $0.25/$2.00 per 1M tokens.
NVIDIA releases Nemotron-3-Super-120B-A12B-BF16, a 120 billion parameter model with latent MoE architecture for efficient text generation across 8 languages.
StepFun releases Step-3.5-Flash-Base as an open-source text generation model optimized for efficient inference under Apache 2.0 license.
Qwen3.5-0.8B released as an 800-million-parameter multimodal model for edge inference. Supports image and text inputs under Apache 2.0 licensing.
Gemini 3.1 Flash Lite Preview launches as Google's high-efficiency model for high-volume use. 1M context at $0.25/$1.50, outperforms Gemini 2.5 Flash Lite.
Qwen3.5-9B released as multimodal 9-billion parameter model supporting image and text inputs. Available under Apache 2.0 license on Hugging Face.
Initial release of Qwen3.5-2B, a 2-billion-parameter multimodal model supporting image and text processing.
Qwen3.5-4B released as a 4 billion parameter multimodal model supporting image and text inputs. Apache 2.0 licensed for open-source use.
Initial release of Context-1, a 20B parameter Mixture of Experts retrieval agent model trained for multi-hop search with self-editing context capabilities.
Released FP8-quantized version of Qwen3.5-35B-A3B, reducing memory requirements while maintaining multimodal capabilities. Compatible with Transformers endpoints and Azure deployment.
Gemini 3.1 Flash-Lite achieves 2.5x faster first-token latency than Gemini 2.5 Flash with 360 tokens/second throughput. Output pricing increased to $1.50 per million tokens from $0.40.
February 2026
Qwen3.5-35B-A3B-Base released as a 35-billion parameter multimodal model with Apache 2.0 license. Part of the Qwen3.5 mixture-of-experts family.
Seed-2.0-Mini launches targeting latency-sensitive scenarios with 262K context and four reasoning effort levels at $0.10/$0.40 per 1M tokens.
Nano Banana 2 (Gemini 3.1 Flash Image Preview) debuts as Google's fastest image generation model. Pro-level quality at Flash speed with $0.50/$3.00 per 1M tokens.
Qwen3.5-Flash debuts with 1M context and ultra-low $0.065/$0.26 pricing. Hybrid architecture delivers a leap in inference efficiency over Qwen 3 series.
Qwen3.5-35B-A3B released as open-weight multimodal model with 35B parameters. Apache 2.0 licensed, supports image and text inputs with conversational capabilities.
Qwen3.5-27B released as a 27-billion parameter multimodal model supporting image-text-to-text tasks. Available under Apache 2.0 license with transformer endpoint compatibility.
Cohere Labs releases tiny-aya-global, a multilingual text generation model fine-tuned from tiny-aya-base to support conversational tasks across 100+ languages including major and low-resource languages.
Lyria 3 integrated into Gemini, enabling 30-second music track generation with vocals, lyrics, and cover art from text prompts or uploaded media.
Alibaba released Qwen 3.5 series claiming performance parity with proprietary frontier models while optimized for commodity hardware, directly challenging closed-source AI model economics.
Gemini 3.1 Pro enters public preview in GitHub Copilot with focus on efficient edit-then-test loops and agentic coding capabilities.
Claude Opus 4.6 — major GPQA and reasoning improvements; ARC-AGI-2 jump from 37.6% to 68.8%.
Qwen3.5 397B A17B launches as a hybrid linear-attention + sparse MoE vision-language model. 262K context at $0.39/$2.34 per 1M tokens with state-of-the-art performance.
Qwen3.5 Plus launches with 1M context and strong cross-domain performance in academia, finance, marketing, programming, and science at $0.30/$1.20 per 1M tokens.
Gemini 3.1 Pro released as upgrade to Gemini 3 Pro. Enhanced reasoning for complex multi-step problems. Preview access via AI Studio.
Claude Sonnet 4.6 brings improved multi-step reasoning and stronger agentic task performance over 4.5, at the same price.
Grok 4 mini brings next-generation reasoning at low cost. Significantly outperforms Grok 3 mini on AIME and competitive coding tasks.
January 2026
Kimi K2.5 launches as Moonshot AI's native multimodal model with state-of-the-art visual coding and agent swarm paradigm. 262K context at $0.45/$2.20 per 1M tokens.
Claude Sonnet 4.5 January patch with stability improvements for long agentic sessions and better tool use across multi-turn workflows.
Gemini 2.5 Pro preview January update with longer stable thinking output windows and improved factual grounding on complex queries.
Global rollout suspended following copyright cease-and-desist letters from Disney and Paramount Skydance over use of copyrighted training material.
December 2025
DeepSeek V3 December update with improved instruction following and expanded Chinese-English code-switching performance.
Gemini 3.0 Pro debuts as first Gemini 3 generation model. Upgraded reasoning and native multimodal understanding. Quickly superseded by 3.1.
Mistral Small 3.1 December update with improved vision accuracy and expanded support for structured data extraction from images.
November 2025
xAI releases Grok 4.1 Thinking variant with reasoning-focused capabilities. Technical specifications and pricing not yet disclosed.
Kimi K2 Thinking debuts as Moonshot AI's most advanced open reasoning model. Trillion-parameter MoE with 32B active params for agentic long-horizon reasoning.
Gemini 2.5 Flash November preview with configurable thinking budget improvements and better cost efficiency at higher thinking token counts.
Claude 3.7 Sonnet November update with significantly improved agentic task completion rates and more reliable computer use across complex workflows.
October 2025
Claude Opus 4.5 — improved agentic coding and reasoning.
Baidu launches ERNIE 4.5 21B A3B Thinking, an upgraded lightweight MoE reasoning model. Top-tier on math, science, and coding benchmarks at $0.07/$0.28 per 1M tokens.
Llama 4 Scout patch fixing multimodal tokenization issues and improving throughput for long-context document tasks.
September 2025
Mistral Large 2 September update with improved code generation and expanded function calling support for complex tool schemas.
Claude 3.7 Sonnet September patch with improved computer use stability and reduced hallucination rate in extended thinking mode.
Kimi K2 September update brings improvements to the trillion-parameter MoE model with 256K long-context inference at $0.40/$1.60 per 1M tokens.
August 2025
ERNIE 4.5 21B A3B launches as Baidu's efficient MoE text model with 21B params (3B active). 120K context at $0.07/$0.28 per 1M tokens.
ERNIE 4.5 VL 28B A3B launches as Baidu's multimodal MoE model with 28B params (3B active). Vision + text understanding at $0.07/$0.28 per 1M tokens.
Qwen3 72B patch release with bug fixes and improved instruction-following stability across multilingual prompts.
Gemini 2.5 Pro preview update with improved tool use accuracy and reduced latency on multi-step reasoning tasks.
July 2025
Claude Sonnet 4.5 with improved agentic performance and hybrid reasoning.
Fastest Claude yet with improved tool use and very low cost per token.
Tencent launches Hunyuan A13B Instruct, a 13B-active MoE model (80B total params) with Chain-of-Thought reasoning. 131K context at $0.14/$0.57 per 1M tokens.
May 2025
DeepSeek R1 updated with meaningfully improved math and reasoning scores. Closes the gap to OpenAI o3 on several benchmarks.
Major new generation. Strongest reasoning and coding of any Anthropic model.
Gemini 2.5 Flash enters preview with configurable thinking budget. Fastest Google model with reasoning at sub-$1/M input cost.
Mistral Medium 3 launches with vision, strong coding, and mid-tier pricing. Outperforms GPT-4o mini on reasoning benchmarks at comparable cost.
April 2025
Qwen3 72B ships with hybrid thinking/non-thinking modes. Claims top open-weights position on coding, math, and multilingual benchmarks.
GPT-4.1 mini inherits 1M context from GPT-4.1. Best cost-to-performance for agentic and long-document pipelines.
GPT-4.1 nano is OpenAI's smallest and fastest model. Ideal for classification, routing, and real-time applications.