MiMo-V2.5

Xiaomi🇨🇳 China
active
Context window1000K tokens

Version History

v2.5minor

MiMo-V2.5 introduces omnimodal capabilities with dedicated vision (729M params) and audio (261M params) encoders built on the MiMo-V2-Flash backbone. The model supports 1M token context and was trained on 48T tokens with agentic RL and multi-teacher distillation.

Benchmark Scores

Full leaderboard →
56.1%
SWE-bench Verified

Coverage

model releaseXiaomi

Xiaomi Launches MiMo-V2.5 With 1M Context Window at $0.40 per Million Input Tokens

Xiaomi released MiMo-V2.5 on April 22, 2026, a native omnimodal model with a 1,048,576 token context window. The model is priced at $0.40 per million input tokens and $2 per million output tokens, positioning it as a cost-efficient alternative for agentic applications requiring multimodal perception across image and video understanding.

2 min read