MiniMax M3

Name: MiniMax M3
Author: MiniMax

MiniMax🇨🇳 China

active

Compare with other models →

Context window1000K tokens

Version History

1.0majorJune 12, 2026

Initial release of M3 with 428B parameters, native multimodal training, and MiniMax Sparse Attention enabling 1M context with 15× decode speedup over M2.

m3majorJune 1, 2026

M3 introduces MiniMax Sparse Attention to enable 1M-token context at approximately 1/20th the compute cost of previous generation. Native multimodal training on interleaved data with interactive user-simulator tuning.

Benchmark Scores

Full leaderboard →

78.1%

MMMU

80.5%

SWE-bench Verified

Coverage

model release

MiniMax Releases M3: 428B-Parameter Multimodal Model with 1M Context Window and 15× Decode Speedup

MiniMax has released M3, a multimodal model with approximately 428 billion parameters and 23 billion activated parameters. The model supports a 1 million token context window and uses MiniMax Sparse Attention to achieve 9× prefill and 15× decode speedups compared to its predecessor M2.

June 12, 2026 · 3:06 PM2 min read

MiniMax M3 multimodal

model release

MiniMax Launches M3 Model With 1M Context Window at $0.30 Per Million Input Tokens

MiniMax has released M3, a multimodal foundation model supporting text, image, and video inputs with a 1-million-token context window. The model costs $0.30 per million input tokens and $1.20 per million output tokens, available through OpenRouter.

June 1, 2026 · 1:05 AM2 min read

MiniMax M3 multimodal