model release

MiniMax Releases M3: 428B-Parameter Multimodal Model with 1M Context Window and 15× Decode Speedup

TL;DR

MiniMax has released M3, a multimodal model with approximately 428 billion parameters and 23 billion activated parameters. The model supports a 1 million token context window and uses MiniMax Sparse Attention to achieve 9× prefill and 15× decode speedups compared to its predecessor M2.

June 12, 2026 · 3:06 PM2 min read

MiniMax M3 — Quick Specs

Context window1000K tokens

Input$0.3/1M tokens

Output$1.2/1M tokens

Compare MiniMax M3 with other models →

MiniMax Releases M3: 428B-Parameter Multimodal Model with 1M Context Window and 15× Decode Speedup

MiniMax has released M3, a multimodal model with approximately 428 billion parameters and 23 billion activated parameters. The model supports a 1 million token context window and uses MiniMax Sparse Attention (MSA) to achieve 9× prefill and 15× decode speedups compared to its predecessor M2, reducing per-token compute to 1/20.

Technical Specifications

M3 uses native multimodal training from the first step, processing text, image, and video inputs through mixed-modality training rather than adapting a text-only model. The model employs MiniMax Sparse Attention, which the company claims dramatically reduces attention compute and memory footprint compared to Grouped Query Attention (GQA) while preserving model quality.

The model features two operating modes: a "thinking" mode for complex reasoning and agentic tasks, and a "non-thinking" mode for latency-sensitive scenarios like chat and code completion. According to MiniMax, M3 achieves frontier-level performance across long-horizon agentic benchmarks.

Pricing details have not been disclosed. The model is available through the MiniMax API and for local deployment via Hugging Face.

Deployment Options

M3 can be deployed locally using three inference frameworks: SGLang, vLLM, and Transformers. MiniMax recommends specific inference parameters: temperature=1.0, top_p=0.95, and top_k=40.

The model supports API access through MiniMax's own API service, with Novita listed as an additional inference provider on Hugging Face. The technical details are available in a research paper on arXiv (arXiv:2606.13392).

What This Means

M3's sparse attention architecture addresses a critical bottleneck in long-context models: compute cost at scale. The claimed 15× decode speedup at 1M tokens, if validated in independent benchmarks, would make M3 significantly more practical for production use cases requiring extended context.

The native multimodal training approach contrasts with common industry practice of adapting text models for visual inputs. This architectural choice suggests MiniMax is betting on deeper semantic integration across modalities, though real-world performance comparisons with models like GPT-4o or Gemini 1.5 Pro will determine whether this approach delivers measurable advantages. The emphasis on agentic capabilities and coding performance positions M3 as a competitor in the autonomous agent and development tools market.

Source: huggingface.co ↗

MiniMax M3 multimodal long-context sparse-attention model-release 428B-parameters 1M-context

model releaseJuly 27, 2026

Moonshot AI Releases Kimi K3: Open-Weight 2.8T-Parameter Model With 1M-Token Context and Native Multimodality

Moonshot AI has released Kimi K3, an open-weight 2.8-trillion-parameter mixture-of-experts model with 104B activated parameters, a 1,048,576-token context window, and native multimodal support. The company describes it as the world's first open 3T-class model, built on a new Kimi Delta Attention architecture.

model releaseJuly 25, 2026

Microsoft Releases Fara1.5-27B, a 27B Vision-Only Web Browsing Agent with 262K Context

Microsoft Research AI Frontiers has released Fara1.5-27B, a 27-billion-parameter multimodal agent that completes web tasks by reading screenshots and emitting click/type/scroll commands. The model, fine-tuned from Qwen3.5-27B, ships under MIT license with a 262K-token context window and is designed to run alongside Microsoft's MagenticLite sandbox.

model releaseJuly 27, 2026

Moonshot AI Open-Sources Kimi K3 Weights After Model Matched GPT-5.6 Sol on Benchmarks

Moonshot AI has released open weights, a technical report, and supporting infrastructure for Kimi K3, a model that claims 2.5x more intelligence per unit of compute. Independent testing found notable gaps in cybersecurity and math performance compared to Western frontier models.

model releaseJuly 27, 2026

Microsoft Launches MAI-Cyber-1-Flash Security Model, Still Routes Hard Cases to OpenAI's GPT-5.4

Microsoft has released MAI-Cyber-1-Flash, a compact cybersecurity model built into its MDASH multi-agent system that scores 96 percent on the CyberGym benchmark. The setup handles 90 percent of security tasks in-house but still hands off difficult cases to OpenAI's GPT-5.4.

MiniMax Releases M3: 428B-Parameter Multimodal Model with 1M Context Window and 15× Decode Speedup

MiniMax M3 — Quick Specs

MiniMax Releases M3: 428B-Parameter Multimodal Model with 1M Context Window and 15× Decode Speedup

Technical Specifications

Deployment Options

What This Means

Related Articles

Moonshot AI Releases Kimi K3: Open-Weight 2.8T-Parameter Model With 1M-Token Context and Native Multimodality

Microsoft Releases Fara1.5-27B, a 27B Vision-Only Web Browsing Agent with 262K Context

Moonshot AI Open-Sources Kimi K3 Weights After Model Matched GPT-5.6 Sol on Benchmarks

Microsoft Launches MAI-Cyber-1-Flash Security Model, Still Routes Hard Cases to OpenAI's GPT-5.4

Comments