MiMo-V2.5

Name: MiMo-V2.5
Author: Xiaomi

Xiaomi🇨🇳 China

active

Compare with other models →

Context window1000K tokens

Version History

v2.5minorApril 28, 2026

MiMo-V2.5 introduces omnimodal capabilities with dedicated vision (729M params) and audio (261M params) encoders built on the MiMo-V2-Flash backbone. The model supports 1M token context and was trained on 48T tokens with agentic RL and multi-teacher distillation.

Benchmark Scores

Full leaderboard →

1433.0 elo

Arena Elo

56.1%

SWE-bench Verified

Coverage

model releaseXiaomi

Xiaomi releases MiMo-V2.5: 310B parameter omnimodal model with 1M token context window

Xiaomi released MiMo-V2.5, a 310B total parameter sparse mixture-of-experts model that activates 15B parameters per token. The omnimodal model supports text, image, video, and audio understanding with a 1M token context window and was trained on 48T tokens using FP8 mixed precision.

April 28, 2026 · 1:06 AM2 min read

xiaomi mimo multimodal

model releaseXiaomi

Xiaomi Launches MiMo-V2.5 With 1M Context Window at $0.40 per Million Input Tokens

Xiaomi released MiMo-V2.5 on April 22, 2026, a native omnimodal model with a 1,048,576 token context window. The model is priced at $0.40 per million input tokens and $2 per million output tokens, positioning it as a cost-efficient alternative for agentic applications requiring multimodal perception across image and video understanding.

April 22, 2026 · 4:36 PM2 min read

xiaomi mimo-v2-5 omnimodal