AI Model Changelog

Every version update across all major AI models.

May 2026

May 23

Initial release of diffusion language model family trained on 1.3T pretraining tokens and 45B fine-tuning tokens. Supports autoregressive, diffusion, and self-speculation generation modes with up to 6.4× speedup over traditional AR models.

May 23

Hy-MT2 is the second major version of Tencent's translation model family. It introduces three model sizes (1.8B, 7B, 30B-A3B MoE) with 33-language support and enhanced instruction-following capabilities.

May 22

Gemini 3.5 Flash now powers Google's AI Mode search, adding support for multimodal inputs including images, video files, and Chrome tabs with improved intent anticipation.

May 22

Hy-MT2 represents a major version release with new 1.8B, 7B, and 30B-A3B model sizes supporting 33 languages, extreme quantization via AngelSlim, and the IFMTBench benchmark for instruction-following evaluation.

May 21

Flagship release of Qwen3.7 series with 1M token context window and agent-first design. Notable improvements in coding and agentic performance over prior Qwen generations with explicit prompt caching support.

May 21

Gemini 3.5 Flash rolls out globally across all Google AI subscription tiers with claimed improvements in speed and understanding for agentic and coding tasks.

May 21
Command A+05-2026snapshotCohere

Initial release of Command A+ open source model featuring 25B active parameters in a 218B parameter sparse mixture-of-experts architecture with vision, tool use, and reasoning capabilities.

May 20

Initial release of Grok Build 0.1, xAI's first coding-specialized model with 256K context window designed for agentic workflows and CLI integration.

May 19

Gemini 3.5 Flash is Google's first model in the 3.5 series, claiming improved agentic and coding capabilities at reduced cost compared to frontier models. Features enhanced safety measures with reasoning checks before responses.

May 19

OlmoEarth v1.1 reduces compute costs by up to 3x through token sequence length reduction, collapsing Sentinel-2 resolution bands into single tokens while maintaining similar benchmark performance to v1.

May 19

Initial release of Lance, a 3B-parameter unified multimodal model supporting image and video generation, editing, and understanding tasks. Trained from scratch using 128 A100 GPUs with staged multi-task training recipe.

May 15

Initial release of Microsoft's first agentic small language model for computer use, trained in 2.5 days on 64 H100s using synthetic data.

May 14

Initial release of Qianfan-OCR-Fast, a specialized OCR model with 66K context window and domain-specific training for improved document processing performance.

May 14

Complete architecture rebuild from XLM-RoBERTa to ModernBERT, expanding context from 512 to 32K tokens (64x increase) and adding code retrieval. The 97M model achieves 60.3 on MTEB Multilingual Retrieval (+12.2 over R1), highest in its size class.

May 13

Initial release of DeepSeek V4 Flash, an efficiency-optimized MoE model with 284B total parameters, 13B active parameters, and 1M-token context window available free through OpenRouter.

May 13

First release of Gemini Omni family supporting multimodal video generation with conversational editing. Speech and audio editing capabilities withheld pending safety testing.

May 12

Initial release of Perceptron Mk1, a vision-language model specializing in video understanding and spatial annotation with optional reasoning capabilities.

May 12

Fast-mode variant of Claude Opus 4.7 with identical capabilities but prioritized output speed at 6x premium pricing ($30/$150 per 1M tokens vs standard rates).

May 11

Initial release of Trinity Large Thinking as a free, open source reasoning model with 262K context window and focus on agentic workloads.

May 10

Gemma 4 E4B assistant introduces Multi-Token Prediction architecture for speculative decoding, achieving up to 2x inference speedup. Features 4.5B effective parameters with Per-Layer Embeddings optimized for on-device deployment.

May 8

Initial release of Ring-2.6-1T, a 1T parameter thinking model with 63B active parameters, featuring adaptive reasoning and 262K context window optimized for agent workflows.

May 8
EMO1.0majorAi2

Initial release of EMO, a mixture-of-experts model pretrained for emergent modularity through document-level expert routing constraints.

May 7

First OpenAI voice model with GPT-5-class reasoning capabilities, designed for live voice interactions with tool calling and natural conversation flow.

May 7

Google launches Flash Lite variant of Gemini 3.1 at 50% cost reduction with maintained 1M context window and four-level reasoning system.

May 7

Initial release of ZAYA1-8B, a post-trained reasoning MoE model with 760M active and 8.4B total parameters achieving frontier-level mathematical performance.

May 7
GPT-5.5-Cyber5.5-cyberminorOpenAI

GPT-5.5-Cyber is a variant of GPT-5.5 with relaxed safeguards specifically for vetted cybersecurity teams. The model is trained to be more permissive on security-related tasks compared to the standard GPT-5.5 release.

May 7
GPT-5.5-Cyber5.5-cyberminorOpenAI

OpenAI released GPT-5.5-Cyber, a version of GPT-5.5 with reduced guardrails for vetted cybersecurity defenders. The model can automate security workflows while blocking credential theft and malware creation.

May 6

Initial release of CoBuddy code generation model with 131K context window, native tool calling, and reasoning support, available for free on OpenRouter.

May 6

Initial release of Multi-Token Prediction assistant model for Gemma 4 26B A4B, enabling up to 2x inference speedup through speculative decoding while maintaining identical output quality.

May 6

Major release introducing Gemma 4 family with four model sizes (E2B, E4B, 26B A4B MoE, 31B dense), Multi-Token Prediction drafters for 2x speedup, extended context windows up to 256K, enhanced multimodal capabilities, and improved reasoning performance.

May 5

Granite Speech 4.1 2B introduces dual-head CTC encoder, frame importance sampling, improved multilingual ASR accuracy, and punctuation/truecasing across all languages. Two new variants add speaker attribution with timestamps and non-autoregressive architecture.

May 5

GPT-5.5 Instant replaced GPT-5.3 Instant as the default ChatGPT model with significant accuracy improvements and more concise response formatting. OpenAI claims 52.5% fewer hallucinations on high-stakes prompts and reduced use of emojis and unnecessary formatting.

May 1

IBM released Granite 4.1 family in 3B, 8B, and 30B sizes under Apache 2.0 license. Unsloth released 21 GGUF quantized variants of the 3B model.

April 2026

Apr 30

Granite 4.1 8B is a new release in IBM's Granite 4.1 family, offering an 8B-parameter dense decoder-only model with enterprise-focused capabilities. Released under Apache 2.0 license with 131K context window and multilingual support.

Apr 30
Grok 4.34.3minorxAI

Grok 4.3 introduces a reasoning model with 1 million token context window, multimodal input support, and always-on reasoning capabilities. Tiered pricing applies for requests exceeding 200K total tokens.

Apr 29

R2 upgrades architecture from XLM-RoBERTa to ModernBERT, extends context from 512 to 32,768 tokens, expands vocabulary to 262K tokens, and adds Matryoshka dimension reduction support. Performance improves by 11.8 points on MTEB Retrieval.

Apr 29

Granite 4.1 30B introduces enhanced tool-calling, improved instruction following through updated SFT-RL pipeline, and 131K context window. Released under Apache 2.0 license with competitive performance on code and reasoning benchmarks.

Apr 29

Mistral Medium 3.5 replaces Medium 3.1 and Magistral with a unified 128B model that combines instruction-following, reasoning, and coding. Introduces configurable reasoning effort and a vision encoder trained from scratch.

Apr 29

First retrained base model since GPT-4.5, co-designed with NVIDIA GB200/GB300 systems. Significant improvements in agentic benchmarks, particularly Terminal-Bench (82.7%) and long-context retrieval (74.0% on MRCR v2).

Apr 28

Initial release of Nemotron 3 Nano Omni, a 31B parameter (30B A3B) multimodal MoE model supporting video, audio, image, and text with 256K context, reasoning mode, and enterprise-focused capabilities.

Apr 28

Initial release of Owl Alpha, OpenRouter's first foundation model designed specifically for agentic workloads with native tool use and 1M+ context window.

Apr 28

Initial release of 13-billion-parameter vintage language model trained exclusively on public domain texts published before 1931.

Apr 28
Nemotron-3-Nano-Omni-30B-A3BNemotron-3-Nano-Omni-30B-A3B-ReasoningmajorNVIDIA

Initial release of Nemotron-3-Nano-Omni-30B-A3B, a multimodal MoE model with 31B parameters combining video, audio, image, and text understanding with reasoning capabilities.

Apr 28

Initial release of Nemotron 3 Nano Omni, a multimodal MoE model with 30B total parameters (3B active) combining video, audio, image, and text understanding in a single inference pass with 131K token context.

Apr 28

Initial release of Nemotron 3 Nano Omni, a 30B-parameter multimodal MoE model designed as a perception sub-agent for enterprise systems. Features hybrid Transformer-Mamba architecture with specialized video processing and extended reasoning capabilities.

Apr 28

First omni-modal release in Nemotron 3 line, adding audio and video capabilities to previous vision-language model. Uses new 30B-A3B MoE architecture with hybrid Mamba-Transformer design.

Apr 28

Initial release of Poolside's flagship coding agent model, optimized for software engineering with 128K context and free API access.

Apr 28

Second-generation XS size model with enhanced tool calling and reasoning capabilities, quantized to fp8 for production efficiency.

Apr 28

MiMo-V2.5 introduces omnimodal capabilities with dedicated vision (729M params) and audio (261M params) encoders built on the MiMo-V2-Flash backbone. The model supports 1M token context and was trained on 48T tokens with agentic RL and multi-teacher distillation.

Apr 28

Initial release of Nemotron 3 Nano Omni, a 31B-parameter MoE multimodal model with video, audio, image, and text understanding, 256K context window, and dedicated reasoning mode with chain-of-thought capabilities.

Apr 28

Laguna XS.2 introduces a 33B parameter MoE architecture with 3B activated parameters per token, featuring mixed sliding window and global attention layers, native reasoning support, and optimization for local deployment.

Apr 27
MiMo-V2.5-Prov2.5-promajorXiaomi

MiMo-V2.5-Pro introduces 1.02T total parameters (up from 310B in V2.5) with 42B active parameters, extends context to 1M tokens, and adds hybrid attention architecture with sliding window and global attention patterns. The model achieves 99.6% on GSM8K and maintains coherence at extreme context lengths.

Apr 27

Initial release of GPT Mini Latest with 400,000 token context window and auto-redirect to newest GPT Mini family version.

Apr 27

Updated Qwen3.5 Plus with 1M token context window and tiered pricing above 256K tokens.

Apr 27

Qwen3.6 27B introduces video processing capabilities alongside existing text and image support, with a 262K context window and built-in thinking mode for agentic coding and reasoning tasks.

Apr 27

Qwen3.6 Flash introduces multimodal support for text, images, and video with a 1M token context window. Features tiered pricing structure with base rates of $0.25/$1.50 per 1M tokens for prompts under 256K tokens.

Apr 27

Google released Gemini Flash Latest as a dynamic router that automatically redirects to the newest Gemini Flash model, featuring 1,048,576 token context and reasoning capabilities.

Apr 27

New sparse MoE model with 35B total parameters but only 3B active per token, featuring 262K context window, multimodal support, and integrated reasoning mode.

Apr 27
Qwen3.6 Max Preview3.6-max-previewsnapshotAlibaba / Qwen

Initial release of Qwen3.6 Max Preview, a proprietary 1 trillion parameter sparse MoE model with 262K context window and integrated thinking mode for agentic workflows.

Apr 27

Moonshot AI introduced a router endpoint that automatically redirects to the most current model in the Kimi family, featuring a 262,144 token context window.

Apr 27

Google released a dynamic router endpoint that automatically redirects to the latest Gemini Pro model, featuring 1,048,576-token context and reasoning capabilities.

Apr 24
GPT-5.5 Pro5.5-prominorOpenAI

GPT-5.5 Pro introduces a 1M+ token context window optimized for complex reasoning tasks, agentic coding, and multi-step workflows with multimodal support.

Apr 24
DeepSeek V4 Prov4-pro-previewmajorDeepSeek

Major release with 1.6T total parameters, 1M token context window, and substantial efficiency improvements over V3.2. DeepSeek claims near-frontier performance at a fraction of the cost.

Apr 24

Major version release with claimed competitive performance against leading US models. DeepSeek emphasizes improved coding capabilities and domestic chip compatibility.

Apr 24

DeepSeek V4 introduces two model sizes with major architectural improvements: hybrid attention mechanisms reduce memory usage by 9.5x-13.7x while supporting 1M token contexts, and mixed FP4/FP8 precision further reduces memory footprint. Added Huawei Ascend NPU support.

Apr 24

DeepSeek-V4-Flash is a new 284B-parameter MoE model with 13B activated parameters, featuring hybrid attention architecture that reduces inference costs by 73% at million-token context lengths. Introduces three reasoning effort modes and achieves competitive performance with frontier models on coding and mathematical reasoning.

Apr 24

Major architectural update from V3.2 with 1.6 trillion total parameters (49B active), 1M token context window, and improved efficiency. Largest open-weight model by parameter count.

Apr 23

Initial release of Ling-2.6-1T, a 1 trillion parameter instruct model with 262K context window and fast-thinking architecture designed for cost-efficient agent deployments.

Apr 22
Hy3 PreviewpreviewsnapshotTencent

Initial preview release of Hy3, Tencent's Mixture-of-Experts model with configurable reasoning modes designed for production agentic workflows.

Apr 22

Initial preview release of Tencent's Hy3 model featuring 262K context, three-tier reasoning system, and free availability through OpenRouter.

Apr 22

Qwen3.6-27B adds extended context to 1M+ tokens, introduces thinking preservation feature, and shows 2-5% improvements in coding agent benchmarks over Qwen3.5-27B. FP8 quantization enables efficient deployment with claimed near-identical performance to full precision.

Apr 22

Initial release of Trinity Large Preview, a 400B-parameter sparse MoE model with 13B active parameters per token, featuring up to 512K context window support and optimization for agentic workflows.

Apr 22

Gemma 4 E2B demonstrated running as a vision-language agent on NVIDIA Jetson Orin Nano Super (8GB), autonomously deciding when to access webcam based on conversational context with no hardcoded triggers.

Apr 22

Qwen3.6-27B is a 27B dense model that claims flagship-level coding performance surpassing the 397B Qwen3.5-397B-A17B. Available at 55.6GB full size or 16.8GB quantized.

Apr 22

Initial release of Privacy Filter, a 1.5B-parameter bidirectional token classifier for detecting 8 PII categories with 128K context window and Apache 2.0 license.

Apr 21

Major update to OpenAI's image generation capabilities with significantly improved text rendering and UI element generation accuracy.

Apr 21

Major release upgrading from previous ChatGPT image generation with 2K resolution support, dual-mode generation (Instant/Thinking), improved text rendering, and web-aware context.

Apr 21

Major update to OpenAI's image generation model with significantly improved quality, maximum resolution of 3840x2160 pixels, and pricing at $30 per million output tokens.

Apr 21

ChatGPT Images 2.0 adds thinking mode integration, enabling the model to reason about visual tasks, gather contextual data, and generate complex layouts combining text and images with improved aspect ratio control.

Apr 21
GPT-5.4 Image 25.4-image-2minorOpenAI

GPT-5.4 Image 2 combines GPT-5.4 reasoning capabilities with image generation from GPT Image 2, supporting text, image, and file inputs with a 272K token context window.

Apr 21

Initial release of Ling-2.6-flash with 104B total parameters (7.4B active), featuring 262K context window and optimized for agent applications with fast response times.

Apr 21

Initial release of Pareto Code Router with dynamic model selection based on min_coding_score parameter (0-1 scale) and 200K context window.

Apr 20

First release of Kimi K2.6 introduces 1T-parameter MoE architecture with agent swarm capabilities, 256K context, and competitive performance on SWE-Bench and agentic benchmarks.

Apr 20

Initial release of Qianfan-OCR-Fast, a specialized OCR model with 65K context window and free pricing. Claims performance improvements over Qianfan-OCR.

Apr 17

First open-weight release of Qwen3.6 series, featuring improved agentic coding capabilities, thinking preservation for context retention, and FP8 quantization. Built on community feedback to prioritize stability and real-world utility.

Apr 17

GR00T N1.7 upgrades to Cosmos-Reason2-2B VLM backbone and adds EgoScale pre-training on 20,854 hours of human egocentric video, improving dexterity and generalization over N1.6.

Apr 16

Python SDK v0.96.0 adds Claude Opus 4-7 support, introduces token budgets for cost management, and user profiles for personalized interactions.

Apr 16

Opus 4.7 introduces significantly more aggressive safety guardrails that automatically detect and block requests related to cybersecurity uses, resulting in a 10-15x increase in reported false positive refusals.

Apr 15

Major release introducing multi-modal 3D world generation capabilities, replacing video-based outputs with persistent 3D assets. Centered on WorldMirror 2.0, a 1.2B parameter model for reconstruction.

Apr 15
Gemini 3.1 Flash TTS3.1-flash-tts-previewmajorGoogle DeepMind

Initial preview release of Gemini 3.1 Flash TTS, Google's first prompt-controlled text-to-speech model that accepts theatrical-style direction for voice characteristics, accents, and delivery style.

Apr 15
GPT-5.4-Cyber5.4-cyberminorOpenAI

Fine-tuned variant of GPT-5.4 specifically built for defensive cybersecurity work with reduced restrictions on security-related tasks. Initial access limited to verified security professionals through Trusted Access for Cyber program.

Apr 15

Gemini 3.1 Flash TTS introduces audio tags for granular control over vocal style, pace, and delivery through natural language commands. The model achieves an Elo score of 1,211 and includes mandatory SynthID watermarking.

Apr 12

Anthropic released Mythos, a security-focused AI model designed to identify and exploit zero-day vulnerabilities. The model remains unreleased publicly due to security concerns.

Apr 12

Initial release of Trinity-Large-Thinking, a 400B parameter open-weight reasoning model with 256 mixture-of-experts (13B active per token) optimized for agent tasks. Apache 2.0 licensed, trained on 17 trillion tokens over 33 days on 2,048 Nvidia B300 GPUs.

Apr 11

Refreshed version of LFM2-VL-450M with updated LFM2.5-350M backbone. Adds bounding box prediction and function calling capabilities while improving performance across vision and language benchmarks.

Apr 9

GLM-5.1 introduces iterative strategy refinement capabilities enabling hundreds of iterations on complex coding tasks. The model achieves 58.4% on SWE-Bench Pro and 6.1x performance improvements on vector database optimization through autonomous approach revision.

Apr 9

Released HY-Embodied-0.5 suite with MoT-2B (2.2B active parameters, 4B total) and 32B variants. MoT-2B trained on 200B+ tokens of embodied data outperforms similarly-sized competitors on 22 embodied benchmarks.

Apr 8

Amazon Bedrock now enables supervised fine-tuning, reinforcement fine-tuning, and model distillation for Nova 2 Lite. Fine-tuned models deploy on-demand at standard inference pricing without provisioned capacity.

Apr 8

Claude Mythos Preview demonstrates unprecedented capability in autonomous vulnerability discovery and exploitation, finding decades-old bugs in OpenBSD, FFmpeg, and FreeBSD. Deployment restricted to 11-partner coalition for defensive cybersecurity use only.

Apr 8
MythosPreviewmajorAnthropic

Anthropic released Mythos as a limited preview with major cybersecurity capabilities and implications. The model has both offensive and defensive cyber applications.

Apr 8
Mythos1.0-previewmajorAnthropic

Mythos is Anthropic's new frontier model, positioned as larger and more intelligent than its Opus models. It is deployed exclusively through Project Glasswing for defensive cybersecurity work with 40+ vetted partner organizations, with claims of identifying thousands of zero-day vulnerabilities during early testing.

Apr 8

Meta's first major new model after $14.3B infrastructure rebuild. Shifts from open-source Llama strategy to fully proprietary model available only via API to select partners.

Apr 7

Claude Mythos Preview released under restricted access through Project Glasswing. Model demonstrates exceptional cybersecurity research capabilities including discovery of 27-year-old OpenBSD TCP SACK vulnerability and Linux privilege escalation flaws.

Apr 7

GLM-5.1 introduces extended autonomous execution capability, claiming ability to work independently on single tasks for over 8 hours with continuous planning and improvement. Focus on coding capability improvements and engineering-grade output generation.

Apr 7

Arcee released Trinity Large Thinking, an open-source reasoning model built on a $20M budget. The company claims it is the most capable open-weight model released by a non-Chinese company, with Apache 2.0 licensing and comparable performance to other top open-source models.

Apr 7

Claude Mythos is Anthropic's specialized model for cybersecurity vulnerability discovery, designed to identify critical flaws in operating systems, browsers, and software. The model shows improvements over Claude Opus 4.6 in reasoning, agent-based capabilities, and coding.

Apr 7

Mythos Preview released as restricted-access model to 40 organizations for defensive security applications. Model demonstrates capability to find tens of thousands of vulnerabilities and autonomously create working exploits across major operating systems.

Apr 7

Initial release of Harrier embedding model. Trained on 2B+ examples with GPT-5 synthetic data. Achieves top ranking on MTEB v2 multilingual benchmark with 131K context window.

Apr 7
Amazon Nova 2 Sonicamazon.nova-sonic-v1:0majorAmazon AWS

Amazon Nova 2 Sonic enables real-time conversational podcast generation with 1M token context window and native support for seven languages through Amazon Bedrock.

Apr 4

Initial release of Bonsai 8B 1-bit quantized model. Achieves 14x compression with claimed competitive performance on standard benchmarks. Also released Bonsai 4B and Bonsai 1.7B variants.

Apr 4

Gemma 4 E4B adds multimodal capabilities (text, image, audio), extended 128K context window, native reasoning modes, and function-calling support compared to Gemma 3. Achieves 69.4% MMLU Pro with 4.5B effective parameters optimized for mobile and edge deployment.

Apr 3

Tencent released OmniWeaving, an open-source unified video generation model with reasoning capabilities and compositional video creation. Built on HunyuanVideo-1.5, it supports eight video generation tasks and introduces IntelligentVBench benchmark.

Apr 3

Zhipu AI released GLM-5V-Turbo, adding multimodal capabilities to its GLM-5 series. The model generates code from design mockups and video inputs while maintaining text-only coding performance, integrating directly with Claude Code and OpenClaw agents.

Apr 3

Gemma 4 26B A4B uses Mixture-of-Experts with 3.8B active parameters for efficient inference. Features 256K context window, multimodal input (text/image), native reasoning modes, and function-calling for agentic workflows.

Apr 3

Gemma 4 introduces multimodal support (text, image, video, audio on small models), extended context windows (128K-256K tokens), configurable reasoning modes, and native function calling. Available in four sizes with both dense and MoE architectures.

Apr 2

Gemma 4 31B introduces 256K context window, configurable reasoning mode, and multimodal image support. Maintains Apache 2.0 open license.

Apr 2

Qwen 3.6 Plus introduces a hybrid architecture with linear attention and sparse mixture-of-experts routing, delivering major improvements in agentic coding, front-end development, and reasoning over the 3.5 series. Achieves 78.8 on SWE-bench Verified.

Apr 2

Gemma 4 introduces four model sizes (2B-31B) with improved reasoning and agentic capabilities. Apache 2.0 licensing replaces previous restrictions. 31B model ranks #3 on Arena AI leaderboard.

Apr 2

Microsoft released MAI-Transcribe-1, a speech-to-text model achieving lowest FLEURS benchmark word error rate at 2.5x faster inference than Azure Fast. Priced at $0.36 per audio hour, supporting 25 languages and challenging recording conditions.

Apr 2

Qwen3.6 Plus introduces hybrid linear attention with sparse mixture-of-experts routing, achieving 78.8 on SWE-bench Verified. Major improvements in coding, reasoning, and multimodal capabilities over 3.5 series.

Apr 2

Gemma 4 introduces multimodal capabilities (text, image, video support), reasoning modes, 256K context windows, and Mixture-of-Experts architecture. The 26B A4B variant uses sparse activation for near-dense-31B performance with 4B-model inference speed.

Apr 2

Gemma 4 introduces multimodal capabilities with native image and audio support, extended 128K context window, built-in reasoning modes with configurable thinking, and hybrid attention architecture combining local and global attention for efficiency.

Apr 2

Gemma 4 introduces multimodal support, 256K context window, Apache 2.0 permissive licensing, and mixture of experts variant. First major version update with explicit focus on enterprise deployment without data usage restrictions.

Apr 2

Gemma 4 introduces multimodal capabilities (text, image, video, audio on small models), extended 256K context windows, configurable reasoning modes, and hybrid dense/mixture-of-experts architectures. Substantial improvements in coding benchmarks, long-context reasoning, and on-device deployment efficiency compared to Gemma 3.

Apr 2

Google DeepMind introduces Gemma 4 31B with multimodal input (text and images), 256K context window, configurable reasoning mode, and native function calling. Free release under Apache 2.0 license.

Apr 2

NVIDIA released NVFP4-quantized version of Google DeepMind's Gemma 4 31B IT model optimized for consumer GPU inference. Maintains 256K context window and multimodal capabilities with <0.5% performance degradation on reasoning and coding benchmarks.

Apr 1

Meta's first closed-source AI model with planned paid developer access, marking a strategic shift from the open-source Llama series. Released from Meta Superintelligence Labs under new AI leadership.

Apr 1

Holo3-122B-A10B released with 78.85% OSWorld score using mixture-of-experts architecture (122B total, 10B active parameters). Trained via agentic learning flywheel with synthetic data augmentation and curated reinforcement learning. Holo3-35B-A3B variant open-sourced under Apache 2.0.

Apr 1

Initial release of Trinity Large Thinking, an open-source reasoning model with 262K context window. Model supports transparent reasoning processes and agentic task handling.

Apr 1

Initial release of Falcon Perception 0.6B early-fusion Transformer for open-vocabulary grounding and segmentation. Introduces Chain-of-Perception output interface and PBench diagnostic benchmark with five capability levels.

March 2026

Mar 31

Grok 4.20 Multi-Agent is a specialized variant designed for collaborative agent-based workflows with parallel agent coordination. Scales agent count based on reasoning effort: 4 agents at low/medium effort, 16 agents at high/xhigh effort.

Mar 31

Microsoft released the Harrier-OSS embedding model family with three parameter sizes (270M, 600M, 27B) supporting multilingual inputs, 32K token context, and knowledge distillation techniques. The 27B variant achieves 74.3 on MTEB v2 benchmark.

Mar 31

Qwen3.5-Omni expands from Qwen3-Omni with 8x context window increase (32K to 256K tokens), 6x language support expansion (11 to 74 languages), hybrid attention-MoE architecture, and ARIA token interleaving for improved real-time speech synthesis. Demonstrates emergent code-generation capability from spoken and video input.

Mar 31

Google released Veo 3.1 Lite, a cost-optimized video generation model priced at less than 50% of Veo 3.1 Fast. Designed for high-volume applications with same generation speed as Veo 3.1 Fast.

Mar 31

Granite 4.0 3B Vision introduces a compact vision-language model optimized for enterprise document processing with DeepStack Injection architecture and ChartNet dataset training. Shipped as LoRA adapter on Granite 4.0 Micro for modular text-only fallback support.

Mar 31
Grok 4.204.20majorxAI

Grok 4.20 is xAI's flagship release featuring a 2 million token context window, toggleable reasoning capabilities, and native agentic tool support. The model claims industry-leading speed with low hallucination rates.

Mar 30

Google releases Lyria 3 Pro Preview, a music generation model producing full-length songs with vocals, lyrics, and instrumental arrangements. Priced at $0.08 per song through the Gemini API with 1M token context window.

Mar 30

Microsoft releases Harrier-OSS-v1 family of multilingual embedding models in three sizes (270M, 0.6B, 27B parameters) trained with contrastive learning and knowledge distillation. The 0.6B variant achieves 69.0 MTEB v2 score with 32,768 token context window and supports 45+ languages.

Mar 30

Lyria 3 Clip Preview introduces Google's music generation model to the Gemini API with clip-based pricing at $0.04 per 30-second generation.

Mar 30

Qwen 3.6 Plus Preview introduces a hybrid architecture with 1M token context window, improved reasoning and agentic behavior over 3.5 series. Available free on OpenRouter with data collection for model improvement.

Mar 28

v5.5 prioritizes user customization with three new features: Voices for voice cloning, Custom Models for style training on user music, and My Taste for preference-based generation. Voices and Custom Models available to Pro/Premier subscribers only.

Mar 27
Suno 5.55.5minorSuno

Suno 5.5 introduces Voices (voice cloning with verification), Custom Models (fine-tune on personal music), and My Taste (personalized recommendations). Described as the company's best and most expressive model yet.

Mar 27

KAT-Coder-Pro V2 builds on earlier KAT-Coder versions with enhanced agentic coding capabilities for large-scale production environments and added web design generation for landing pages and presentation decks.

Mar 27

Cohere releases Transcribe, a 2B parameter open-source speech recognition model with 5.42% WER on the Hugging Face leaderboard, supporting 14 languages under Apache 2.0 license.

Mar 26

Mistral released Voxtral TTS, an open-source speech model with 90ms latency, 6x real-time factor, and support for 9 languages with custom voice adaptation from sub-5-second samples.

Mar 26

Dreamina Seedance 2.0 launches in CapCut with IP safeguards including face-detection blocks, unauthorized IP generation prevention, and invisible watermarking to identify AI-generated content.

Mar 26

First Mistral text-to-speech model. Supports voice cloning from minimal audio, available as both API and open-weights version.

Mar 26

Google released Gemini 3.1 Flash Live, an audio-focused model optimized for multilingual conversations. The model powers the expansion of Search Live to 200+ countries with claimed improvements to response speed and conversation naturalness.

Mar 26

Gemini 3.1 Flash Live improves upon 2.5 Flash Native Audio with enhanced acoustic recognition, background noise filtering, lower latency, and extended conversation context.

Mar 26

NVIDIA released gpt-oss-puzzle-88B, an inference-optimized 88B-parameter mixture-of-experts model derived from gpt-oss-120B using the Puzzle NAS framework. Achieves 1.63× throughput improvement on long-context and up to 2.82× on single H100s while maintaining parent accuracy through heterogeneous expert pruning, selective window attention, and knowledge distillation with RL optimization.

Mar 25

Lyria 3 Pro improves upon Lyria 3 with better understanding of musical structures and enhanced track generation capabilities up to 3 minutes. Model available across Gemini, Google Vids, Vertex AI, and Google AI Studio.

Mar 25

Expanded from 30-second to 3-minute song generation with improved structural composition control. Added support for specifying discrete song elements.

Mar 25

Apple developed RubiCap, a rubric-guided reinforcement learning framework for dense image captioning that achieves state-of-the-art results with 2B-7B parameter models, outperforming competitors up to 72B parameters.

Mar 25

Initial release of MolmoWeb with 4B and 8B parameter variants. Includes full training dataset (MolmoWebMix), model weights, and evaluation tools under Apache 2.0 license.

Mar 23

Nemotron 3 Super now available on Amazon Bedrock as fully managed serverless inference. 120B parameter MoE model with 12B active parameters, 256K context, claims 5x throughput improvement and 2x accuracy gain over previous version.

Mar 22

Xiaomi released MiMo-V2-Pro as 3x larger successor to MiMo-V2-Flash (Dec 2025), reaching 1T parameters with 42B active per request. Benchmarks place it 3rd globally on PinchBench/ClawEval, nearly matching Claude Opus 4.6 on coding (78% vs 80.8%) while costing 80% less per input token.

Mar 22

Composer 2 launched with frontier-level coding intelligence, built on Moonshot AI's open-source Kimi 2.5 model with additional reinforcement learning training applied by Cursor (75% of final compute).

Mar 21

M2.7 introduces autonomous participation in its own development through 100+ self-optimization rounds, achieving 30% performance improvement on internal coding tasks and competitive benchmark scores against leading Western models.

Mar 20

Reka releases Reka Edge, a new 7-billion parameter multimodal model optimized for efficient image and video understanding with a 16K context window.

Mar 19

Composer 2 achieves 61.3 on CursorBench (+38% vs Composer 1.5) through improved pretraining and reinforcement learning on long-horizon tasks. Pricing set at $0.50/$2.50 per 1M tokens, undercutting Claude and GPT-4 by 60-90%.

Mar 19

MAI-Image-2 improves upon MAI-Image-1 with enhanced photorealism, natural lighting, and notably adds reliable text rendering capabilities for practical design applications. The model ranks third on Arena.ai leaderboard, up from ninth place for the previous version.

Mar 18

MiMo-V2-Pro is Xiaomi's flagship foundation model launch featuring 1T+ parameters and 1M context window optimized for agent systems and complex workflow orchestration.

Mar 18
MiMo-V2-Promimo-v2-pro-2026-03-18majorXiaomi

Xiaomi launches MiMo-V2-Pro, their flagship 1T-parameter foundation model with 1M context. Optimized for agentic scenarios, ranking among global top tier on standard benchmarks.

Mar 18
MiMo-V2-Omnimimo-v2-omni-2026-03-18majorXiaomi

Xiaomi debuts MiMo-V2-Omni, a frontier omni-modal model processing image, video, and audio natively. 262K context with strong agentic capabilities including visual grounding and code execution.

Mar 18
Qwen3.5-Max-Previewqwen3.5-max-preview-2026-03-18majorAlibaba / Qwen

Qwen3.5-Max-Preview launches as Alibaba's largest model (1T+ params) in the Qwen3.5 series. 262K context with thinking mode (82K CoT). Beats previous flagship on reasoning, multilingual, and agentic tasks. $1.20/$6.00 per 1M tokens.

Mar 17
GPT-5.4 Nanogpt-5.4-nano-2026-03-17majorOpenAI

GPT-5.4 Nano launches as the smallest, fastest member of the GPT-5.4 family. 400K context, multimodal input, optimized for high-volume agentic tasks at $0.20/$1.25 per 1M tokens.

Mar 17

GPT-5.4 mini introduces major improvements in coding, reasoning, and computer control capabilities over GPT-5 mini. Model runs over 2x faster and achieves near-full-GPT-5.4 performance on multiple benchmarks while consuming 30% of quota in agentic systems.

Mar 17
GPT-5.4 mini5.4-miniminorOpenAI

GPT-5.4 mini, OpenAI's fastest variant of GPT-5.4, is now generally available in GitHub Copilot. The model claims to be the highest-performing mini offering for coding tasks.

Mar 17

Leanstral released as 120B-parameter agent for formal code verification using Lean, available with open weights (Apache 2.0) and free API endpoint. Claims superiority over larger open-source models and 85% cost savings versus Claude Sonnet on FLTEval benchmarks.

Mar 16

Initial release of MiniMax M2.7 — next-gen LLM with multi-agent collaboration, 204K context, SWE-Pro 56.2%, Terminal Bench 2 57.0%

Mar 16

Initial release of Nemotron-3-Nano-4B-GGUF, a quantized (Q4_K_M) 4B parameter edge model with hybrid Mamba-2 architecture. Supports controllable reasoning modes and 262K context window for edge AI applications including gaming NPCs and local voice assistants.

Mar 16
Mistral Small 4mistral-small-4-2026-03-16majorMistral AI

Mistral Small 4 launches unifying Magistral reasoning, Pixtral multimodal, and agentic coding into one model. 262K context at $0.15/$0.60 per 1M tokens.

Mar 15
Seedream 4.5seedream-4.5-2026-03-15majorByteDance

ByteDance releases Seedream 4.5, their latest image generation model with major quality improvements over Seedream 4.0.

Mar 15
GLM 5 Turboglm-5-turbo-2026-03-15majorZhipu AI

GLM 5 Turbo launches with 203K context and fast inference optimized for agent-driven environments. Improved complex reasoning over base GLM 5 at $0.96/$3.20 per 1M tokens.

Mar 12

Nvidia released Llama 3.1 Nemotron 70B Instruct, an instruction-tuned variant of Meta's Llama 3.1 70B model optimized for developer applications.

Mar 12

Minimax released M1 80k, expanding its M1 model family with an 80,000-token context window for extended document processing.

Mar 12

Mistral AI releases Pixtral Large, a new multimodal model supporting image and text inputs with 128K context window.

Mar 12

Minimax releases M1 40k model with 40,000-token context window. Initial release with limited publicly disclosed specifications.

Mar 12

Google launched Ask Maps, integrating Gemini AI into Google Maps to allow users to ask complex contextual navigation questions. The chatbot personalizes responses based on user search history and saved locations.

Mar 11

Nvidia releases Nemotron 3 Super, a 120B hybrid MoE model with 1M context window, latent expert routing, and multi-token prediction. Fully open-weight under NVIDIA Open License.

Mar 10
Seed-2.0-Liteseed-2.0-lite-2026-03-10majorByteDance

ByteDance launches Seed-2.0-Lite, a cost-efficient multimodal enterprise model with 262K context. Strong agent capabilities at $0.25/$2.00 per 1M tokens.

Mar 10

NVIDIA releases Nemotron-3-Super-120B-A12B-BF16, a 120 billion parameter model with latent MoE architecture for efficient text generation across 8 languages.

Mar 10
Nemotron 3 Super120B-A12B-NVFP4majorNVIDIA

NVIDIA releases Nemotron-3-Super-120B, a 120B parameter model with latent MoE architecture optimized for conversational tasks across 8 languages.

Mar 5

StepFun releases Step-3.5-Flash-Base as an open-source text generation model optimized for efficient inference under Apache 2.0 license.

Mar 5

GPT-5.4, OpenAI's agentic coding model, is now generally available in GitHub Copilot after early testing validated improved performance on real-world and agentic software development tasks.

Mar 2

Qwen3.5-4B released as a 4 billion parameter multimodal model supporting image and text inputs. Apache 2.0 licensed for open-source use.

Mar 2

Initial release of Qwen3.5-2B, a 2-billion-parameter multimodal model supporting image and text processing.

Mar 2

Qwen3.5-0.8B released as an 800-million-parameter multimodal model for edge inference. Supports image and text inputs under Apache 2.0 licensing.

Mar 2
Gemini 3.1 Flash Lite Previewgemini-3.1-flash-lite-preview-2026-03-02majorGoogle DeepMind

Gemini 3.1 Flash Lite Preview launches as Google's high-efficiency model for high-volume use. 1M context at $0.25/$1.50, outperforms Gemini 2.5 Flash Lite.

Mar 2

Qwen3.5-9B released as multimodal 9-billion parameter model supporting image and text inputs. Available under Apache 2.0 license on Hugging Face.

Mar 1

Released FP8-quantized version of Qwen3.5-35B-A3B, reducing memory requirements while maintaining multimodal capabilities. Compatible with Transformers endpoints and Azure deployment.

Mar 1

Gemini 3.1 Flash-Lite achieves 2.5x faster first-token latency than Gemini 2.5 Flash with 360 tokens/second throughput. Output pricing increased to $1.50 per million tokens from $0.40.

Mar 1

Initial release of Context-1, a 20B parameter Mixture of Experts retrieval agent model trained for multi-hop search with self-editing context capabilities.

February 2026

Feb 26

Nano Banana 2 (Gemini 3.1 Flash Image Preview) debuts as Google's fastest image generation model. Pro-level quality at Flash speed with $0.50/$3.00 per 1M tokens.

Feb 26

Qwen3.5-35B-A3B-Base released as a 35-billion parameter multimodal model with Apache 2.0 license. Part of the Qwen3.5 mixture-of-experts family.

Feb 26
Seed-2.0-Miniseed-2.0-mini-2026-02-26majorByteDance

Seed-2.0-Mini launches targeting latency-sensitive scenarios with 262K context and four reasoning effort levels at $0.10/$0.40 per 1M tokens.

Feb 25
Qwen3.5-Flashqwen3.5-flash-2026-02-25majorAlibaba / Qwen

Qwen3.5-Flash debuts with 1M context and ultra-low $0.065/$0.26 pricing. Hybrid architecture delivers a leap in inference efficiency over Qwen 3 series.

Feb 24

Qwen3.5-35B-A3B released as open-weight multimodal model with 35B parameters. Apache 2.0 licensed, supports image and text inputs with conversational capabilities.

Feb 24

Qwen3.5-27B released as a 27-billion parameter multimodal model supporting image-text-to-text tasks. Available under Apache 2.0 license with transformer endpoint compatibility.

Feb 22

Cohere Labs releases tiny-aya-global, a multilingual text generation model fine-tuned from tiny-aya-base to support conversational tasks across 100+ languages including major and low-resource languages.

Feb 20

Gemini 3.1 Pro enters public preview in GitHub Copilot with focus on efficient edit-then-test loops and agentic coding capabilities.

Feb 20
ClaudeunspecifiedsnapshotAnthropic

Claude deployed in Goldman Sachs production environment for trade accounting and client onboarding operations.

Feb 20

Lyria 3 integrated into Gemini, enabling 30-second music track generation with vocals, lyrics, and cover art from text prompts or uploaded media.

Feb 20

Alibaba released Qwen 3.5 series claiming performance parity with proprietary frontier models while optimized for commodity hardware, directly challenging closed-source AI model economics.

Feb 17
Claude Opus 4.6claude-opus-4-6-20260217Anthropic

Claude Opus 4.6 — major GPQA and reasoning improvements; ARC-AGI-2 jump from 37.6% to 68.8%.

Feb 17

GLM-5.1 introduces sustained agentic reasoning over hundreds of iterations with improved performance on SWE-Bench Pro (58.4%, +3.3pp vs. GLM-5) and NL2Repo (42.7%, +6.8pp vs. GLM-5). The model maintains productivity across longer problem-solving sessions through iterative experimentation and strategy revision.

Feb 16
Qwen3.5 397B A17Bqwen3.5-397b-a17b-2026-02-16majorAlibaba / Qwen

Qwen3.5 397B A17B launches as a hybrid linear-attention + sparse MoE vision-language model. 262K context at $0.39/$2.34 per 1M tokens with state-of-the-art performance.

Feb 15
Qwen3.5 Plusqwen3.5-plus-2026-02-15majorAlibaba / Qwen

Qwen3.5 Plus launches with 1M context and strong cross-domain performance in academia, finance, marketing, programming, and science at $0.30/$1.20 per 1M tokens.

Feb 14
Gemini 3.1 Progemini-3.1-pro-preview-0214majorGoogle DeepMind

Gemini 3.1 Pro released as upgrade to Gemini 3 Pro. Enhanced reasoning for complex multi-step problems. Preview access via AI Studio.

Feb 11
GLM 5glm-5-2026-02-11majorZhipu AI

Z.ai launches GLM 5, their flagship open-source foundation model for complex systems design and long-horizon agentic workflows. 80K context at $0.72/$2.30 per 1M tokens.

Feb 10
Claude Sonnet 4.6claude-sonnet-4-6-20260210majorAnthropic

Claude Sonnet 4.6 brings improved multi-step reasoning and stronger agentic task performance over 4.5, at the same price.

Feb 1
Grok 4grok-4-betamajorxAI

Grok 4 launches as xAI flagship with 256K context and major multimodal capability leap over Grok 3. Claims top-tier performance on coding and reasoning benchmarks.

Feb 1
Grok 4 minigrok-4-mini-betamajorxAI

Grok 4 mini brings next-generation reasoning at low cost. Significantly outperforms Grok 3 mini on AIME and competitive coding tasks.

January 2026

Jan 26
Kimi K2.5kimi-k2.5-2026-01-26majorMoonshot AI

Kimi K2.5 launches as Moonshot AI's native multimodal model with state-of-the-art visual coding and agent swarm paradigm. 262K context at $0.45/$2.20 per 1M tokens.

Jan 21
Claude Sonnet 4.5claude-sonnet-4-5-20260121patchAnthropic

Claude Sonnet 4.5 January patch with stability improvements for long agentic sessions and better tool use across multi-turn workflows.

Jan 14
Gemini 2.5 Progemini-2.5-pro-preview-01-14minorGoogle DeepMind

Gemini 2.5 Pro preview January update with longer stable thinking output windows and improved factual grounding on complex queries.

Jan 10
o3o3-2026-01-10minorOpenAI

o3 January 2026 update with expanded API availability across all tiers and improved performance on multi-step code debugging tasks.

Jan 1

Global rollout suspended following copyright cease-and-desist letters from Disney and Paramount Skydance over use of copyrighted training material.

December 2025

Dec 17
DeepSeek V3DeepSeek-V3-1217minorDeepSeek

DeepSeek V3 December update with improved instruction following and expanded Chinese-English code-switching performance.

Dec 17
GPT-4ogpt-4o-2025-12-17minorOpenAI

GPT-4o December snapshot with improved real-time audio mode quality and enhanced JSON schema structured output compliance.

Dec 10
Gemini 3.0 Progemini-3.0-pro-preview-1210majorGoogle DeepMind

Gemini 3.0 Pro debuts as first Gemini 3 generation model. Upgraded reasoning and native multimodal understanding. Quickly superseded by 3.1.

Dec 10
Mistral Small 3.1mistral-small-2512minorMistral AI

Mistral Small 3.1 December update with improved vision accuracy and expanded support for structured data extraction from images.

November 2025

Nov 20
o3-minio3-mini-2025-11-20minorOpenAI

o3-mini November update with expanded API tier access and improved reliability for the high-effort reasoning mode.

Nov 17
Grok 4.1 Thinking4.1-thinkingminorxAI

xAI releases Grok 4.1 Thinking variant with reasoning-focused capabilities. Technical specifications and pricing not yet disclosed.

Nov 6
Kimi K2 Thinkingkimi-k2-thinking-2025-11-06majorMoonshot AI

Kimi K2 Thinking debuts as Moonshot AI's most advanced open reasoning model. Trillion-parameter MoE with 32B active params for agentic long-horizon reasoning.

Nov 5
Gemini 2.5 Flashgemini-2.5-flash-preview-11-05minorGoogle DeepMind

Gemini 2.5 Flash November preview with configurable thinking budget improvements and better cost efficiency at higher thinking token counts.

Nov 3
Claude 3.7 Sonnetclaude-3-7-sonnet-20251103minorAnthropic

Claude 3.7 Sonnet November update with significantly improved agentic task completion rates and more reliable computer use across complex workflows.

October 2025

Oct 28
Amazon Nova Multimodal Embeddingsamazon.nova-2-multimodal-embeddings-v1:0minorAmazon AWS

Added audio embedding support to Nova Multimodal Embeddings. Model now processes audio content alongside text, images, documents, and video with four embedding dimension options (3,072, 1,024, 384, 256) using Matryoshka Representation Learning.

Oct 20
Grok 3grok-3-2025-10minorxAI

Grok 3 October update extending knowledge cutoff and improving accuracy on real-time query grounding from X data.

Oct 15

Claude Opus 4.5 — improved agentic coding and reasoning.

Oct 14
GPT-4ogpt-4o-2025-10-14minorOpenAI

GPT-4o October snapshot with vision improvements and better performance on document understanding tasks.

Oct 9
ERNIE 4.5 21B A3B Thinkingernie-4.5-21b-a3b-thinking-2025-10-09majorBaidu AI

Baidu launches ERNIE 4.5 21B A3B Thinking, an upgraded lightweight MoE reasoning model. Top-tier on math, science, and coding benchmarks at $0.07/$0.28 per 1M tokens.

Oct 1
Llama 4 ScoutLlama-4-Scout-17B-16E-0921patchMeta AI

Llama 4 Scout patch fixing multimodal tokenization issues and improving throughput for long-context document tasks.

September 2025

Sep 18
Mistral Large 2mistral-large-2409minorMistral AI

Mistral Large 2 September update with improved code generation and expanded function calling support for complex tool schemas.

Sep 15
Claude 3.7 Sonnetclaude-3-7-sonnet-20250915patchAnthropic

Claude 3.7 Sonnet September patch with improved computer use stability and reduced hallucination rate in extended thinking mode.

Sep 12
o1o1-2025-09-12minorOpenAI

o1 September update with improved tool calling reliability and expanded coverage of edge cases in mathematical reasoning.

Sep 5
Kimi K2 0905kimi-k2-0905majorMoonshot AI

Kimi K2 September update brings improvements to the trillion-parameter MoE model with 256K long-context inference at $0.40/$1.60 per 1M tokens.

August 2025

Aug 12
ERNIE 4.5 VL 28B A3Bernie-4.5-vl-28b-a3b-2025-08-12majorBaidu AI

ERNIE 4.5 VL 28B A3B launches as Baidu's multimodal MoE model with 28B params (3B active). Vision + text understanding at $0.07/$0.28 per 1M tokens.

Aug 12
ERNIE 4.5 21B A3Bernie-4.5-21b-a3b-2025-08-12majorBaidu AI

ERNIE 4.5 21B A3B launches as Baidu's efficient MoE text model with 21B params (3B active). 120K context at $0.07/$0.28 per 1M tokens.

Aug 6
GPT-4ogpt-4o-2025-08-06minorOpenAI

GPT-4o August 2025 snapshot with improved structured output reliability and expanded multilingual performance.

Aug 6
Qwen3 72BQwen3-72B-0806patchAlibaba / Qwen

Qwen3 72B patch release with bug fixes and improved instruction-following stability across multilingual prompts.

Aug 5
Gemini 2.5 Progemini-2.5-pro-preview-08-05minorGoogle DeepMind

Gemini 2.5 Pro preview update with improved tool use accuracy and reduced latency on multi-step reasoning tasks.

July 2025

Jul 22
Claude Sonnet 4.5claude-sonnet-4-5majorAnthropic

Claude Sonnet 4.5 with improved agentic performance and hybrid reasoning.

Jul 8
Claude Haiku 4.5claude-haiku-4-5majorAnthropic

Fastest Claude yet with improved tool use and very low cost per token.

Jul 8
Hunyuan A13B Instructhunyuan-a13b-instruct-2025-07-08majorTencent

Tencent launches Hunyuan A13B Instruct, a 13B-active MoE model (80B total params) with Chain-of-Thought reasoning. 131K context at $0.14/$0.57 per 1M tokens.

May 2025

May 28
DeepSeek R1 (0528)DeepSeek-R1-0528minorDeepSeek

DeepSeek R1 updated with meaningfully improved math and reasoning scores. Closes the gap to OpenAI o3 on several benchmarks.