AI Model Changelog
Every version update across all major AI models.
May 2026
Initial release of diffusion language model family trained on 1.3T pretraining tokens and 45B fine-tuning tokens. Supports autoregressive, diffusion, and self-speculation generation modes with up to 6.4× speedup over traditional AR models.
Gemini 3.5 Flash now powers Google's AI Mode search, adding support for multimodal inputs including images, video files, and Chrome tabs with improved intent anticipation.
Hy-MT2 represents a major version release with new 1.8B, 7B, and 30B-A3B model sizes supporting 33 languages, extreme quantization via AngelSlim, and the IFMTBench benchmark for instruction-following evaluation.
Flagship release of Qwen3.7 series with 1M token context window and agent-first design. Notable improvements in coding and agentic performance over prior Qwen generations with explicit prompt caching support.
Gemini 3.5 Flash rolls out globally across all Google AI subscription tiers with claimed improvements in speed and understanding for agentic and coding tasks.
Initial release of Command A+ open source model featuring 25B active parameters in a 218B parameter sparse mixture-of-experts architecture with vision, tool use, and reasoning capabilities.
Initial release of Grok Build 0.1, xAI's first coding-specialized model with 256K context window designed for agentic workflows and CLI integration.
Gemini 3.5 Flash is Google's first model in the 3.5 series, claiming improved agentic and coding capabilities at reduced cost compared to frontier models. Features enhanced safety measures with reasoning checks before responses.
OlmoEarth v1.1 reduces compute costs by up to 3x through token sequence length reduction, collapsing Sentinel-2 resolution bands into single tokens while maintaining similar benchmark performance to v1.
Initial release of Qianfan-OCR-Fast, a specialized OCR model with 66K context window and domain-specific training for improved document processing performance.
Complete architecture rebuild from XLM-RoBERTa to ModernBERT, expanding context from 512 to 32K tokens (64x increase) and adding code retrieval. The 97M model achieves 60.3 on MTEB Multilingual Retrieval (+12.2 over R1), highest in its size class.
Initial release of DeepSeek V4 Flash, an efficiency-optimized MoE model with 284B total parameters, 13B active parameters, and 1M-token context window available free through OpenRouter.
First release of Gemini Omni family supporting multimodal video generation with conversational editing. Speech and audio editing capabilities withheld pending safety testing.
Initial release of Perceptron Mk1, a vision-language model specializing in video understanding and spatial annotation with optional reasoning capabilities.
Fast-mode variant of Claude Opus 4.7 with identical capabilities but prioritized output speed at 6x premium pricing ($30/$150 per 1M tokens vs standard rates).
Initial release of Trinity Large Thinking as a free, open source reasoning model with 262K context window and focus on agentic workloads.
Gemma 4 E4B assistant introduces Multi-Token Prediction architecture for speculative decoding, achieving up to 2x inference speedup. Features 4.5B effective parameters with Per-Layer Embeddings optimized for on-device deployment.
Initial release of Ring-2.6-1T, a 1T parameter thinking model with 63B active parameters, featuring adaptive reasoning and 262K context window optimized for agent workflows.
First OpenAI voice model with GPT-5-class reasoning capabilities, designed for live voice interactions with tool calling and natural conversation flow.
Google launches Flash Lite variant of Gemini 3.1 at 50% cost reduction with maintained 1M context window and four-level reasoning system.
GPT-5.5-Cyber is a variant of GPT-5.5 with relaxed safeguards specifically for vetted cybersecurity teams. The model is trained to be more permissive on security-related tasks compared to the standard GPT-5.5 release.
OpenAI released GPT-5.5-Cyber, a version of GPT-5.5 with reduced guardrails for vetted cybersecurity defenders. The model can automate security workflows while blocking credential theft and malware creation.
Initial release of CoBuddy code generation model with 131K context window, native tool calling, and reasoning support, available for free on OpenRouter.
Initial release of Multi-Token Prediction assistant model for Gemma 4 26B A4B, enabling up to 2x inference speedup through speculative decoding while maintaining identical output quality.
Major release introducing Gemma 4 family with four model sizes (E2B, E4B, 26B A4B MoE, 31B dense), Multi-Token Prediction drafters for 2x speedup, extended context windows up to 256K, enhanced multimodal capabilities, and improved reasoning performance.
Granite Speech 4.1 2B introduces dual-head CTC encoder, frame importance sampling, improved multilingual ASR accuracy, and punctuation/truecasing across all languages. Two new variants add speaker attribution with timestamps and non-autoregressive architecture.
GPT-5.5 Instant replaced GPT-5.3 Instant as the default ChatGPT model with significant accuracy improvements and more concise response formatting. OpenAI claims 52.5% fewer hallucinations on high-stakes prompts and reduced use of emojis and unnecessary formatting.
IBM released Granite 4.1 family in 3B, 8B, and 30B sizes under Apache 2.0 license. Unsloth released 21 GGUF quantized variants of the 3B model.
April 2026
Granite 4.1 8B is a new release in IBM's Granite 4.1 family, offering an 8B-parameter dense decoder-only model with enterprise-focused capabilities. Released under Apache 2.0 license with 131K context window and multilingual support.
R2 upgrades architecture from XLM-RoBERTa to ModernBERT, extends context from 512 to 32,768 tokens, expands vocabulary to 262K tokens, and adds Matryoshka dimension reduction support. Performance improves by 11.8 points on MTEB Retrieval.
Granite 4.1 30B introduces enhanced tool-calling, improved instruction following through updated SFT-RL pipeline, and 131K context window. Released under Apache 2.0 license with competitive performance on code and reasoning benchmarks.
Mistral Medium 3.5 replaces Medium 3.1 and Magistral with a unified 128B model that combines instruction-following, reasoning, and coding. Introduces configurable reasoning effort and a vision encoder trained from scratch.
Initial release of Nemotron 3 Nano Omni, a 31B parameter (30B A3B) multimodal MoE model supporting video, audio, image, and text with 256K context, reasoning mode, and enterprise-focused capabilities.
Initial release of Owl Alpha, OpenRouter's first foundation model designed specifically for agentic workloads with native tool use and 1M+ context window.
Initial release of 13-billion-parameter vintage language model trained exclusively on public domain texts published before 1931.
Initial release of Nemotron-3-Nano-Omni-30B-A3B, a multimodal MoE model with 31B parameters combining video, audio, image, and text understanding with reasoning capabilities.
Initial release of Nemotron 3 Nano Omni, a multimodal MoE model with 30B total parameters (3B active) combining video, audio, image, and text understanding in a single inference pass with 131K token context.
Initial release of Nemotron 3 Nano Omni, a 30B-parameter multimodal MoE model designed as a perception sub-agent for enterprise systems. Features hybrid Transformer-Mamba architecture with specialized video processing and extended reasoning capabilities.
First omni-modal release in Nemotron 3 line, adding audio and video capabilities to previous vision-language model. Uses new 30B-A3B MoE architecture with hybrid Mamba-Transformer design.
Initial release of Poolside's flagship coding agent model, optimized for software engineering with 128K context and free API access.
Second-generation XS size model with enhanced tool calling and reasoning capabilities, quantized to fp8 for production efficiency.
Initial release of Nemotron 3 Nano Omni, a 31B-parameter MoE multimodal model with video, audio, image, and text understanding, 256K context window, and dedicated reasoning mode with chain-of-thought capabilities.
Laguna XS.2 introduces a 33B parameter MoE architecture with 3B activated parameters per token, featuring mixed sliding window and global attention layers, native reasoning support, and optimization for local deployment.
MiMo-V2.5-Pro introduces 1.02T total parameters (up from 310B in V2.5) with 42B active parameters, extends context to 1M tokens, and adds hybrid attention architecture with sliding window and global attention patterns. The model achieves 99.6% on GSM8K and maintains coherence at extreme context lengths.
Initial release of GPT Mini Latest with 400,000 token context window and auto-redirect to newest GPT Mini family version.
Updated Qwen3.5 Plus with 1M token context window and tiered pricing above 256K tokens.
Qwen3.6 27B introduces video processing capabilities alongside existing text and image support, with a 262K context window and built-in thinking mode for agentic coding and reasoning tasks.
Qwen3.6 Flash introduces multimodal support for text, images, and video with a 1M token context window. Features tiered pricing structure with base rates of $0.25/$1.50 per 1M tokens for prompts under 256K tokens.
Google released Gemini Flash Latest as a dynamic router that automatically redirects to the newest Gemini Flash model, featuring 1,048,576 token context and reasoning capabilities.
New sparse MoE model with 35B total parameters but only 3B active per token, featuring 262K context window, multimodal support, and integrated reasoning mode.
Initial release of Qwen3.6 Max Preview, a proprietary 1 trillion parameter sparse MoE model with 262K context window and integrated thinking mode for agentic workflows.
Moonshot AI introduced a router endpoint that automatically redirects to the most current model in the Kimi family, featuring a 262,144 token context window.
Google released a dynamic router endpoint that automatically redirects to the latest Gemini Pro model, featuring 1,048,576-token context and reasoning capabilities.
GPT-5.5 Pro introduces a 1M+ token context window optimized for complex reasoning tasks, agentic coding, and multi-step workflows with multimodal support.
Major release with 1.6T total parameters, 1M token context window, and substantial efficiency improvements over V3.2. DeepSeek claims near-frontier performance at a fraction of the cost.
Major version release with claimed competitive performance against leading US models. DeepSeek emphasizes improved coding capabilities and domestic chip compatibility.
DeepSeek V4 introduces two model sizes with major architectural improvements: hybrid attention mechanisms reduce memory usage by 9.5x-13.7x while supporting 1M token contexts, and mixed FP4/FP8 precision further reduces memory footprint. Added Huawei Ascend NPU support.
DeepSeek-V4-Flash is a new 284B-parameter MoE model with 13B activated parameters, featuring hybrid attention architecture that reduces inference costs by 73% at million-token context lengths. Introduces three reasoning effort modes and achieves competitive performance with frontier models on coding and mathematical reasoning.
Major architectural update from V3.2 with 1.6 trillion total parameters (49B active), 1M token context window, and improved efficiency. Largest open-weight model by parameter count.
Initial release of Ling-2.6-1T, a 1 trillion parameter instruct model with 262K context window and fast-thinking architecture designed for cost-efficient agent deployments.
Initial preview release of Hy3, Tencent's Mixture-of-Experts model with configurable reasoning modes designed for production agentic workflows.
Initial preview release of Tencent's Hy3 model featuring 262K context, three-tier reasoning system, and free availability through OpenRouter.
Qwen3.6-27B adds extended context to 1M+ tokens, introduces thinking preservation feature, and shows 2-5% improvements in coding agent benchmarks over Qwen3.5-27B. FP8 quantization enables efficient deployment with claimed near-identical performance to full precision.
Initial release of Trinity Large Preview, a 400B-parameter sparse MoE model with 13B active parameters per token, featuring up to 512K context window support and optimization for agentic workflows.
Gemma 4 E2B demonstrated running as a vision-language agent on NVIDIA Jetson Orin Nano Super (8GB), autonomously deciding when to access webcam based on conversational context with no hardcoded triggers.
Qwen3.6-27B is a 27B dense model that claims flagship-level coding performance surpassing the 397B Qwen3.5-397B-A17B. Available at 55.6GB full size or 16.8GB quantized.
Initial release of Privacy Filter, a 1.5B-parameter bidirectional token classifier for detecting 8 PII categories with 128K context window and Apache 2.0 license.
Major update to OpenAI's image generation capabilities with significantly improved text rendering and UI element generation accuracy.
Major release upgrading from previous ChatGPT image generation with 2K resolution support, dual-mode generation (Instant/Thinking), improved text rendering, and web-aware context.
Major update to OpenAI's image generation model with significantly improved quality, maximum resolution of 3840x2160 pixels, and pricing at $30 per million output tokens.
ChatGPT Images 2.0 adds thinking mode integration, enabling the model to reason about visual tasks, gather contextual data, and generate complex layouts combining text and images with improved aspect ratio control.
GPT-5.4 Image 2 combines GPT-5.4 reasoning capabilities with image generation from GPT Image 2, supporting text, image, and file inputs with a 272K token context window.
Initial release of Ling-2.6-flash with 104B total parameters (7.4B active), featuring 262K context window and optimized for agent applications with fast response times.
Initial release of Pareto Code Router with dynamic model selection based on min_coding_score parameter (0-1 scale) and 200K context window.
First release of Kimi K2.6 introduces 1T-parameter MoE architecture with agent swarm capabilities, 256K context, and competitive performance on SWE-Bench and agentic benchmarks.
Initial release of Qianfan-OCR-Fast, a specialized OCR model with 65K context window and free pricing. Claims performance improvements over Qianfan-OCR.
First open-weight release of Qwen3.6 series, featuring improved agentic coding capabilities, thinking preservation for context retention, and FP8 quantization. Built on community feedback to prioritize stability and real-world utility.
GR00T N1.7 upgrades to Cosmos-Reason2-2B VLM backbone and adds EgoScale pre-training on 20,854 hours of human egocentric video, improving dexterity and generalization over N1.6.
Python SDK v0.96.0 adds Claude Opus 4-7 support, introduces token budgets for cost management, and user profiles for personalized interactions.
Opus 4.7 introduces significantly more aggressive safety guardrails that automatically detect and block requests related to cybersecurity uses, resulting in a 10-15x increase in reported false positive refusals.
Major release introducing multi-modal 3D world generation capabilities, replacing video-based outputs with persistent 3D assets. Centered on WorldMirror 2.0, a 1.2B parameter model for reconstruction.
Initial preview release of Gemini 3.1 Flash TTS, Google's first prompt-controlled text-to-speech model that accepts theatrical-style direction for voice characteristics, accents, and delivery style.
Fine-tuned variant of GPT-5.4 specifically built for defensive cybersecurity work with reduced restrictions on security-related tasks. Initial access limited to verified security professionals through Trusted Access for Cyber program.
Gemini 3.1 Flash TTS introduces audio tags for granular control over vocal style, pace, and delivery through natural language commands. The model achieves an Elo score of 1,211 and includes mandatory SynthID watermarking.
Initial release of Trinity-Large-Thinking, a 400B parameter open-weight reasoning model with 256 mixture-of-experts (13B active per token) optimized for agent tasks. Apache 2.0 licensed, trained on 17 trillion tokens over 33 days on 2,048 Nvidia B300 GPUs.
Refreshed version of LFM2-VL-450M with updated LFM2.5-350M backbone. Adds bounding box prediction and function calling capabilities while improving performance across vision and language benchmarks.
Released HY-Embodied-0.5 suite with MoT-2B (2.2B active parameters, 4B total) and 32B variants. MoT-2B trained on 200B+ tokens of embodied data outperforms similarly-sized competitors on 22 embodied benchmarks.
Amazon Bedrock now enables supervised fine-tuning, reinforcement fine-tuning, and model distillation for Nova 2 Lite. Fine-tuned models deploy on-demand at standard inference pricing without provisioned capacity.
Claude Mythos Preview demonstrates unprecedented capability in autonomous vulnerability discovery and exploitation, finding decades-old bugs in OpenBSD, FFmpeg, and FreeBSD. Deployment restricted to 11-partner coalition for defensive cybersecurity use only.
Mythos is Anthropic's new frontier model, positioned as larger and more intelligent than its Opus models. It is deployed exclusively through Project Glasswing for defensive cybersecurity work with 40+ vetted partner organizations, with claims of identifying thousands of zero-day vulnerabilities during early testing.
Meta's first major new model after $14.3B infrastructure rebuild. Shifts from open-source Llama strategy to fully proprietary model available only via API to select partners.
Claude Mythos Preview released under restricted access through Project Glasswing. Model demonstrates exceptional cybersecurity research capabilities including discovery of 27-year-old OpenBSD TCP SACK vulnerability and Linux privilege escalation flaws.
Arcee released Trinity Large Thinking, an open-source reasoning model built on a $20M budget. The company claims it is the most capable open-weight model released by a non-Chinese company, with Apache 2.0 licensing and comparable performance to other top open-source models.
Claude Mythos is Anthropic's specialized model for cybersecurity vulnerability discovery, designed to identify critical flaws in operating systems, browsers, and software. The model shows improvements over Claude Opus 4.6 in reasoning, agent-based capabilities, and coding.
Mythos Preview released as restricted-access model to 40 organizations for defensive security applications. Model demonstrates capability to find tens of thousands of vulnerabilities and autonomously create working exploits across major operating systems.
Initial release of Harrier embedding model. Trained on 2B+ examples with GPT-5 synthetic data. Achieves top ranking on MTEB v2 multilingual benchmark with 131K context window.
Amazon Nova 2 Sonic enables real-time conversational podcast generation with 1M token context window and native support for seven languages through Amazon Bedrock.
Initial release of Bonsai 8B 1-bit quantized model. Achieves 14x compression with claimed competitive performance on standard benchmarks. Also released Bonsai 4B and Bonsai 1.7B variants.
Gemma 4 E4B adds multimodal capabilities (text, image, audio), extended 128K context window, native reasoning modes, and function-calling support compared to Gemma 3. Achieves 69.4% MMLU Pro with 4.5B effective parameters optimized for mobile and edge deployment.
Tencent released OmniWeaving, an open-source unified video generation model with reasoning capabilities and compositional video creation. Built on HunyuanVideo-1.5, it supports eight video generation tasks and introduces IntelligentVBench benchmark.
Zhipu AI released GLM-5V-Turbo, adding multimodal capabilities to its GLM-5 series. The model generates code from design mockups and video inputs while maintaining text-only coding performance, integrating directly with Claude Code and OpenClaw agents.
Gemma 4 26B A4B uses Mixture-of-Experts with 3.8B active parameters for efficient inference. Features 256K context window, multimodal input (text/image), native reasoning modes, and function-calling for agentic workflows.
Gemma 4 introduces multimodal support (text, image, video, audio on small models), extended context windows (128K-256K tokens), configurable reasoning modes, and native function calling. Available in four sizes with both dense and MoE architectures.
Gemma 4 31B introduces 256K context window, configurable reasoning mode, and multimodal image support. Maintains Apache 2.0 open license.
Qwen 3.6 Plus introduces a hybrid architecture with linear attention and sparse mixture-of-experts routing, delivering major improvements in agentic coding, front-end development, and reasoning over the 3.5 series. Achieves 78.8 on SWE-bench Verified.
Gemma 4 introduces four model sizes (2B-31B) with improved reasoning and agentic capabilities. Apache 2.0 licensing replaces previous restrictions. 31B model ranks #3 on Arena AI leaderboard.
Microsoft released MAI-Transcribe-1, a speech-to-text model achieving lowest FLEURS benchmark word error rate at 2.5x faster inference than Azure Fast. Priced at $0.36 per audio hour, supporting 25 languages and challenging recording conditions.
Qwen3.6 Plus introduces hybrid linear attention with sparse mixture-of-experts routing, achieving 78.8 on SWE-bench Verified. Major improvements in coding, reasoning, and multimodal capabilities over 3.5 series.
Gemma 4 introduces multimodal capabilities (text, image, video support), reasoning modes, 256K context windows, and Mixture-of-Experts architecture. The 26B A4B variant uses sparse activation for near-dense-31B performance with 4B-model inference speed.
Gemma 4 introduces multimodal capabilities with native image and audio support, extended 128K context window, built-in reasoning modes with configurable thinking, and hybrid attention architecture combining local and global attention for efficiency.
Gemma 4 introduces multimodal support, 256K context window, Apache 2.0 permissive licensing, and mixture of experts variant. First major version update with explicit focus on enterprise deployment without data usage restrictions.
Gemma 4 introduces multimodal capabilities (text, image, video, audio on small models), extended 256K context windows, configurable reasoning modes, and hybrid dense/mixture-of-experts architectures. Substantial improvements in coding benchmarks, long-context reasoning, and on-device deployment efficiency compared to Gemma 3.
Google DeepMind introduces Gemma 4 31B with multimodal input (text and images), 256K context window, configurable reasoning mode, and native function calling. Free release under Apache 2.0 license.
NVIDIA released NVFP4-quantized version of Google DeepMind's Gemma 4 31B IT model optimized for consumer GPU inference. Maintains 256K context window and multimodal capabilities with <0.5% performance degradation on reasoning and coding benchmarks.
Meta's first closed-source AI model with planned paid developer access, marking a strategic shift from the open-source Llama series. Released from Meta Superintelligence Labs under new AI leadership.
Holo3-122B-A10B released with 78.85% OSWorld score using mixture-of-experts architecture (122B total, 10B active parameters). Trained via agentic learning flywheel with synthetic data augmentation and curated reinforcement learning. Holo3-35B-A3B variant open-sourced under Apache 2.0.
Initial release of Trinity Large Thinking, an open-source reasoning model with 262K context window. Model supports transparent reasoning processes and agentic task handling.
Initial release of Falcon Perception 0.6B early-fusion Transformer for open-vocabulary grounding and segmentation. Introduces Chain-of-Perception output interface and PBench diagnostic benchmark with five capability levels.
March 2026
Grok 4.20 Multi-Agent is a specialized variant designed for collaborative agent-based workflows with parallel agent coordination. Scales agent count based on reasoning effort: 4 agents at low/medium effort, 16 agents at high/xhigh effort.
Microsoft released the Harrier-OSS embedding model family with three parameter sizes (270M, 600M, 27B) supporting multilingual inputs, 32K token context, and knowledge distillation techniques. The 27B variant achieves 74.3 on MTEB v2 benchmark.
Qwen3.5-Omni expands from Qwen3-Omni with 8x context window increase (32K to 256K tokens), 6x language support expansion (11 to 74 languages), hybrid attention-MoE architecture, and ARIA token interleaving for improved real-time speech synthesis. Demonstrates emergent code-generation capability from spoken and video input.
Google released Veo 3.1 Lite, a cost-optimized video generation model priced at less than 50% of Veo 3.1 Fast. Designed for high-volume applications with same generation speed as Veo 3.1 Fast.
Granite 4.0 3B Vision introduces a compact vision-language model optimized for enterprise document processing with DeepStack Injection architecture and ChartNet dataset training. Shipped as LoRA adapter on Granite 4.0 Micro for modular text-only fallback support.
Google releases Lyria 3 Pro Preview, a music generation model producing full-length songs with vocals, lyrics, and instrumental arrangements. Priced at $0.08 per song through the Gemini API with 1M token context window.
Microsoft releases Harrier-OSS-v1 family of multilingual embedding models in three sizes (270M, 0.6B, 27B parameters) trained with contrastive learning and knowledge distillation. The 0.6B variant achieves 69.0 MTEB v2 score with 32,768 token context window and supports 45+ languages.
Lyria 3 Clip Preview introduces Google's music generation model to the Gemini API with clip-based pricing at $0.04 per 30-second generation.
Qwen 3.6 Plus Preview introduces a hybrid architecture with 1M token context window, improved reasoning and agentic behavior over 3.5 series. Available free on OpenRouter with data collection for model improvement.
KAT-Coder-Pro V2 builds on earlier KAT-Coder versions with enhanced agentic coding capabilities for large-scale production environments and added web design generation for landing pages and presentation decks.
Cohere releases Transcribe, a 2B parameter open-source speech recognition model with 5.42% WER on the Hugging Face leaderboard, supporting 14 languages under Apache 2.0 license.
Mistral released Voxtral TTS, an open-source speech model with 90ms latency, 6x real-time factor, and support for 9 languages with custom voice adaptation from sub-5-second samples.
Dreamina Seedance 2.0 launches in CapCut with IP safeguards including face-detection blocks, unauthorized IP generation prevention, and invisible watermarking to identify AI-generated content.
First Mistral text-to-speech model. Supports voice cloning from minimal audio, available as both API and open-weights version.
Google released Gemini 3.1 Flash Live, an audio-focused model optimized for multilingual conversations. The model powers the expansion of Search Live to 200+ countries with claimed improvements to response speed and conversation naturalness.
Gemini 3.1 Flash Live improves upon 2.5 Flash Native Audio with enhanced acoustic recognition, background noise filtering, lower latency, and extended conversation context.
NVIDIA released gpt-oss-puzzle-88B, an inference-optimized 88B-parameter mixture-of-experts model derived from gpt-oss-120B using the Puzzle NAS framework. Achieves 1.63× throughput improvement on long-context and up to 2.82× on single H100s while maintaining parent accuracy through heterogeneous expert pruning, selective window attention, and knowledge distillation with RL optimization.
Lyria 3 Pro improves upon Lyria 3 with better understanding of musical structures and enhanced track generation capabilities up to 3 minutes. Model available across Gemini, Google Vids, Vertex AI, and Google AI Studio.
Expanded from 30-second to 3-minute song generation with improved structural composition control. Added support for specifying discrete song elements.
Apple developed RubiCap, a rubric-guided reinforcement learning framework for dense image captioning that achieves state-of-the-art results with 2B-7B parameter models, outperforming competitors up to 72B parameters.
Initial release of MolmoWeb with 4B and 8B parameter variants. Includes full training dataset (MolmoWebMix), model weights, and evaluation tools under Apache 2.0 license.
Nemotron 3 Super now available on Amazon Bedrock as fully managed serverless inference. 120B parameter MoE model with 12B active parameters, 256K context, claims 5x throughput improvement and 2x accuracy gain over previous version.
Xiaomi released MiMo-V2-Pro as 3x larger successor to MiMo-V2-Flash (Dec 2025), reaching 1T parameters with 42B active per request. Benchmarks place it 3rd globally on PinchBench/ClawEval, nearly matching Claude Opus 4.6 on coding (78% vs 80.8%) while costing 80% less per input token.
Composer 2 launched with frontier-level coding intelligence, built on Moonshot AI's open-source Kimi 2.5 model with additional reinforcement learning training applied by Cursor (75% of final compute).
M2.7 introduces autonomous participation in its own development through 100+ self-optimization rounds, achieving 30% performance improvement on internal coding tasks and competitive benchmark scores against leading Western models.
Composer 2 achieves 61.3 on CursorBench (+38% vs Composer 1.5) through improved pretraining and reinforcement learning on long-horizon tasks. Pricing set at $0.50/$2.50 per 1M tokens, undercutting Claude and GPT-4 by 60-90%.
MAI-Image-2 improves upon MAI-Image-1 with enhanced photorealism, natural lighting, and notably adds reliable text rendering capabilities for practical design applications. The model ranks third on Arena.ai leaderboard, up from ninth place for the previous version.
MiMo-V2-Pro is Xiaomi's flagship foundation model launch featuring 1T+ parameters and 1M context window optimized for agent systems and complex workflow orchestration.
Xiaomi launches MiMo-V2-Pro, their flagship 1T-parameter foundation model with 1M context. Optimized for agentic scenarios, ranking among global top tier on standard benchmarks.
Xiaomi debuts MiMo-V2-Omni, a frontier omni-modal model processing image, video, and audio natively. 262K context with strong agentic capabilities including visual grounding and code execution.
Qwen3.5-Max-Preview launches as Alibaba's largest model (1T+ params) in the Qwen3.5 series. 262K context with thinking mode (82K CoT). Beats previous flagship on reasoning, multilingual, and agentic tasks. $1.20/$6.00 per 1M tokens.
GPT-5.4 Nano launches as the smallest, fastest member of the GPT-5.4 family. 400K context, multimodal input, optimized for high-volume agentic tasks at $0.20/$1.25 per 1M tokens.
GPT-5.4 mini introduces major improvements in coding, reasoning, and computer control capabilities over GPT-5 mini. Model runs over 2x faster and achieves near-full-GPT-5.4 performance on multiple benchmarks while consuming 30% of quota in agentic systems.
GPT-5.4 mini, OpenAI's fastest variant of GPT-5.4, is now generally available in GitHub Copilot. The model claims to be the highest-performing mini offering for coding tasks.
Leanstral released as 120B-parameter agent for formal code verification using Lean, available with open weights (Apache 2.0) and free API endpoint. Claims superiority over larger open-source models and 85% cost savings versus Claude Sonnet on FLTEval benchmarks.
Initial release of MiniMax M2.7 — next-gen LLM with multi-agent collaboration, 204K context, SWE-Pro 56.2%, Terminal Bench 2 57.0%
Initial release of Nemotron-3-Nano-4B-GGUF, a quantized (Q4_K_M) 4B parameter edge model with hybrid Mamba-2 architecture. Supports controllable reasoning modes and 262K context window for edge AI applications including gaming NPCs and local voice assistants.
Mistral Small 4 launches unifying Magistral reasoning, Pixtral multimodal, and agentic coding into one model. 262K context at $0.15/$0.60 per 1M tokens.
ByteDance releases Seedream 4.5, their latest image generation model with major quality improvements over Seedream 4.0.
GLM 5 Turbo launches with 203K context and fast inference optimized for agent-driven environments. Improved complex reasoning over base GLM 5 at $0.96/$3.20 per 1M tokens.
Nvidia released Llama 3.1 Nemotron 70B Instruct, an instruction-tuned variant of Meta's Llama 3.1 70B model optimized for developer applications.
Minimax released M1 80k, expanding its M1 model family with an 80,000-token context window for extended document processing.
Mistral AI releases Pixtral Large, a new multimodal model supporting image and text inputs with 128K context window.
Minimax releases M1 40k model with 40,000-token context window. Initial release with limited publicly disclosed specifications.
Google launched Ask Maps, integrating Gemini AI into Google Maps to allow users to ask complex contextual navigation questions. The chatbot personalizes responses based on user search history and saved locations.
Nvidia releases Nemotron 3 Super, a 120B hybrid MoE model with 1M context window, latent expert routing, and multi-token prediction. Fully open-weight under NVIDIA Open License.
ByteDance launches Seed-2.0-Lite, a cost-efficient multimodal enterprise model with 262K context. Strong agent capabilities at $0.25/$2.00 per 1M tokens.
NVIDIA releases Nemotron-3-Super-120B-A12B-BF16, a 120 billion parameter model with latent MoE architecture for efficient text generation across 8 languages.
NVIDIA releases Nemotron-3-Super-120B, a 120B parameter model with latent MoE architecture optimized for conversational tasks across 8 languages.
StepFun releases Step-3.5-Flash-Base as an open-source text generation model optimized for efficient inference under Apache 2.0 license.
Qwen3.5-4B released as a 4 billion parameter multimodal model supporting image and text inputs. Apache 2.0 licensed for open-source use.
Initial release of Qwen3.5-2B, a 2-billion-parameter multimodal model supporting image and text processing.
Qwen3.5-0.8B released as an 800-million-parameter multimodal model for edge inference. Supports image and text inputs under Apache 2.0 licensing.
Gemini 3.1 Flash Lite Preview launches as Google's high-efficiency model for high-volume use. 1M context at $0.25/$1.50, outperforms Gemini 2.5 Flash Lite.
Qwen3.5-9B released as multimodal 9-billion parameter model supporting image and text inputs. Available under Apache 2.0 license on Hugging Face.
Released FP8-quantized version of Qwen3.5-35B-A3B, reducing memory requirements while maintaining multimodal capabilities. Compatible with Transformers endpoints and Azure deployment.
Gemini 3.1 Flash-Lite achieves 2.5x faster first-token latency than Gemini 2.5 Flash with 360 tokens/second throughput. Output pricing increased to $1.50 per million tokens from $0.40.
Initial release of Context-1, a 20B parameter Mixture of Experts retrieval agent model trained for multi-hop search with self-editing context capabilities.
February 2026
Nano Banana 2 (Gemini 3.1 Flash Image Preview) debuts as Google's fastest image generation model. Pro-level quality at Flash speed with $0.50/$3.00 per 1M tokens.
Qwen3.5-35B-A3B-Base released as a 35-billion parameter multimodal model with Apache 2.0 license. Part of the Qwen3.5 mixture-of-experts family.
Seed-2.0-Mini launches targeting latency-sensitive scenarios with 262K context and four reasoning effort levels at $0.10/$0.40 per 1M tokens.
Qwen3.5-Flash debuts with 1M context and ultra-low $0.065/$0.26 pricing. Hybrid architecture delivers a leap in inference efficiency over Qwen 3 series.
Qwen3.5-35B-A3B released as open-weight multimodal model with 35B parameters. Apache 2.0 licensed, supports image and text inputs with conversational capabilities.
Qwen3.5-27B released as a 27-billion parameter multimodal model supporting image-text-to-text tasks. Available under Apache 2.0 license with transformer endpoint compatibility.
Cohere Labs releases tiny-aya-global, a multilingual text generation model fine-tuned from tiny-aya-base to support conversational tasks across 100+ languages including major and low-resource languages.
Gemini 3.1 Pro enters public preview in GitHub Copilot with focus on efficient edit-then-test loops and agentic coding capabilities.
Lyria 3 integrated into Gemini, enabling 30-second music track generation with vocals, lyrics, and cover art from text prompts or uploaded media.
Alibaba released Qwen 3.5 series claiming performance parity with proprietary frontier models while optimized for commodity hardware, directly challenging closed-source AI model economics.
Claude Opus 4.6 — major GPQA and reasoning improvements; ARC-AGI-2 jump from 37.6% to 68.8%.
GLM-5.1 introduces sustained agentic reasoning over hundreds of iterations with improved performance on SWE-Bench Pro (58.4%, +3.3pp vs. GLM-5) and NL2Repo (42.7%, +6.8pp vs. GLM-5). The model maintains productivity across longer problem-solving sessions through iterative experimentation and strategy revision.
Qwen3.5 397B A17B launches as a hybrid linear-attention + sparse MoE vision-language model. 262K context at $0.39/$2.34 per 1M tokens with state-of-the-art performance.
Qwen3.5 Plus launches with 1M context and strong cross-domain performance in academia, finance, marketing, programming, and science at $0.30/$1.20 per 1M tokens.
Gemini 3.1 Pro released as upgrade to Gemini 3 Pro. Enhanced reasoning for complex multi-step problems. Preview access via AI Studio.
Claude Sonnet 4.6 brings improved multi-step reasoning and stronger agentic task performance over 4.5, at the same price.
Grok 4 mini brings next-generation reasoning at low cost. Significantly outperforms Grok 3 mini on AIME and competitive coding tasks.
January 2026
Kimi K2.5 launches as Moonshot AI's native multimodal model with state-of-the-art visual coding and agent swarm paradigm. 262K context at $0.45/$2.20 per 1M tokens.
Claude Sonnet 4.5 January patch with stability improvements for long agentic sessions and better tool use across multi-turn workflows.
Gemini 2.5 Pro preview January update with longer stable thinking output windows and improved factual grounding on complex queries.
Global rollout suspended following copyright cease-and-desist letters from Disney and Paramount Skydance over use of copyrighted training material.
December 2025
DeepSeek V3 December update with improved instruction following and expanded Chinese-English code-switching performance.
Gemini 3.0 Pro debuts as first Gemini 3 generation model. Upgraded reasoning and native multimodal understanding. Quickly superseded by 3.1.
Mistral Small 3.1 December update with improved vision accuracy and expanded support for structured data extraction from images.
November 2025
xAI releases Grok 4.1 Thinking variant with reasoning-focused capabilities. Technical specifications and pricing not yet disclosed.
Kimi K2 Thinking debuts as Moonshot AI's most advanced open reasoning model. Trillion-parameter MoE with 32B active params for agentic long-horizon reasoning.
Gemini 2.5 Flash November preview with configurable thinking budget improvements and better cost efficiency at higher thinking token counts.
Claude 3.7 Sonnet November update with significantly improved agentic task completion rates and more reliable computer use across complex workflows.
October 2025
Added audio embedding support to Nova Multimodal Embeddings. Model now processes audio content alongside text, images, documents, and video with four embedding dimension options (3,072, 1,024, 384, 256) using Matryoshka Representation Learning.
Claude Opus 4.5 — improved agentic coding and reasoning.
Baidu launches ERNIE 4.5 21B A3B Thinking, an upgraded lightweight MoE reasoning model. Top-tier on math, science, and coding benchmarks at $0.07/$0.28 per 1M tokens.
Llama 4 Scout patch fixing multimodal tokenization issues and improving throughput for long-context document tasks.
September 2025
Mistral Large 2 September update with improved code generation and expanded function calling support for complex tool schemas.
Claude 3.7 Sonnet September patch with improved computer use stability and reduced hallucination rate in extended thinking mode.
Kimi K2 September update brings improvements to the trillion-parameter MoE model with 256K long-context inference at $0.40/$1.60 per 1M tokens.
August 2025
ERNIE 4.5 VL 28B A3B launches as Baidu's multimodal MoE model with 28B params (3B active). Vision + text understanding at $0.07/$0.28 per 1M tokens.
ERNIE 4.5 21B A3B launches as Baidu's efficient MoE text model with 21B params (3B active). 120K context at $0.07/$0.28 per 1M tokens.
Qwen3 72B patch release with bug fixes and improved instruction-following stability across multilingual prompts.
Gemini 2.5 Pro preview update with improved tool use accuracy and reduced latency on multi-step reasoning tasks.
July 2025
Claude Sonnet 4.5 with improved agentic performance and hybrid reasoning.
Fastest Claude yet with improved tool use and very low cost per token.
Tencent launches Hunyuan A13B Instruct, a 13B-active MoE model (80B total params) with Chain-of-Thought reasoning. 131K context at $0.14/$0.57 per 1M tokens.
May 2025
DeepSeek R1 updated with meaningfully improved math and reasoning scores. Closes the gap to OpenAI o3 on several benchmarks.