model releaseXiaomi

Xiaomi launches MiMo-V2-Pro with 1T parameters, matches Claude Opus on coding at 80% lower cost

TL;DR

Xiaomi shipped three AI models simultaneously designed to form a complete agent platform. MiMo-V2-Pro, a 1-trillion-parameter Mixture-of-Experts model with 42 billion active parameters per request, scores 78% on SWE-bench Verified and 81 points on ClawEval—nearly matching Claude Opus 4.6 while costing $1 per million input tokens versus $5 for Opus.

3 min read
0

MiMo-V2-Pro — Quick Specs

Context window1000K tokens
Input$1/1M tokens
Output$3/1M tokens

Xiaomi has simultaneously released three AI models designed to power autonomous agents, robots, and voice applications, marking the company's push into full-stack agent infrastructure.

MiMo-V2-Pro: Trillion-Parameter LLM

The flagship MiMo-V2-Pro runs a Mixture-of-Experts architecture with over 1 trillion total parameters, of which 42 billion are active per request—roughly 3x the scale of its December 2025 predecessor, MiMo-V2-Flash. Despite the increased scale, a hybrid attention mechanism enables efficient processing with context windows up to 1 million tokens. The model generates multiple tokens simultaneously rather than predicting one at a time, delivering a speed advantage.

Performance Benchmarks:

  • SWE-bench Verified: 78% (Claude Opus 4.6: 80.8%, Claude Sonnet 4.6: 79.6%)
  • ClawEval (agent benchmark): 81 points (Claude Opus 4.6: 81.5, GPT-5.2: 77)
  • PinchBench: Ranks 3rd globally behind Claude Opus 4.6
  • ClawEval: 3rd globally
  • Artificial Analysis Intelligence Index: 7th worldwide, top-ranked Chinese model after GLM-5 and MiniMax-M2.7

Pricing: $1 per million input tokens, $3 per million output tokens (up to 256,000 context). Cache writing costs currently waived. For comparison, Claude Sonnet 4.6 costs $3/$15 and Claude Opus 4.6 costs $5/$25—making MiMo-V2-Pro 80% cheaper than Opus on input tokens.

Before official launch, MiMo-V2-Pro operated anonymously on OpenRouter under codename "Hunter Alpha," where it topped daily rankings for several days and accumulated over 1 trillion tokens, with many users initially assuming it was a new DeepSeek model.

MiMo-V2-Omni: Multimodal Agent Model

MiMo-V2-Omni integrates image, video, and audio encoders into a shared backbone. The model natively supports structured tool calls, function execution, and autonomous UI navigation.

Benchmark Performance:

  • MMMU-Pro (image): 76.8 (beats Claude Opus 4.6's 73.9)
  • Audio benchmarks: Beats Gemini 3 Pro; records continuously for 10+ hours
  • Video: Falls short of Gemini 3 Pro
  • ClawEval (agent tasks): 54.8 (Claude Opus 4.6: 66.3, GPT-5.2: 59.6)
  • MM-BrowserComp: Outperforms both Gemini 3 Pro and GPT-5.2 on web navigation

Demos showed the model autonomously browsing dashcam footage to flag hazards, navigating e-commerce sites (Xiaohongshu, JD.com), haggling with customer service, completing purchases, and creating multimedia content with debugging—all without human intervention. OpenClaw handles the actual clicks and file operations.

MiMo-V2-TTS: Emotional Speech Synthesis

MiMo-V2-TTS trained on over 100 million hours of speech data, uses parallel discrete units for finer control over sound, rhythm, and emotion. Users describe desired voice characteristics in natural language ("sleepy, just woken up, slightly hoarse") rather than selecting from presets.

Key capabilities include natively synthesizing both speech and singing in one model, generating paralinguistic sounds (coughs, sighs, laughter) as part of output, and interpreting typographic cues (capitals, repeated characters) as emphasis signals.

Strategic Positioning

Xiaomi has partnered with five agent frameworks (OpenClaw, OpenCode, KiloCode, Blackbox, Cline) and offering free API access for one week. The company explicitly targets AI agent and robotics workloads, with the MiMo team stating next priorities are long-term planning across hours/days, real-time streaming, multi-agent coordination, and robotics.

What this means

Xiaomi is pricing competitively to dislodge Anthropic from dominance in the agent/coding segment, with MiMo-V2-Pro demonstrating near-parity on benchmarks at significantly lower cost. However, meaningful gaps remain on general agent tasks (ClawEval scores lag Claude Opus) and multimodal reasoning (video performance trails Gemini 3 Pro). The three-model release signals Xiaomi's commitment to building an integrated agent platform rather than competing on single models—a different approach from Anthropic's integrated multimodal strategy.

Related Articles

model release

Mistral OCR 3 launches at $2 per 1,000 pages with 74% win rate over previous version

Mistral AI released Mistral OCR 3, a document extraction model priced at $2 per 1,000 pages ($1 with Batch API discount). The model achieves a 74% overall win rate over its predecessor on forms, scanned documents, complex tables, and handwriting according to internal benchmarks.

model release

Mistral Releases Mistral 3 Family: 675B-Parameter Large 3 MoE and Three Edge Models Under Apache 2.0

Mistral has released Mistral 3, including Mistral Large 3—a sparse mixture-of-experts model with 41B active and 675B total parameters—and three Ministral 3 edge models (3B, 8B, 14B). All models are released under Apache 2.0 license with multimodal capabilities and are available today on multiple platforms.

model release

Mistral OCR 4 Launches With Bounding Boxes, 170 Language Support at $2-4 Per 1,000 Pages

Mistral AI released OCR 4, a compact document extraction model that returns bounding boxes, block classification, and inline confidence scores alongside text. The model supports 170 languages, scores 85.20 on OlmOCRBench, and is priced at $4 per 1,000 pages via API ($2 with batch discount) or $5 per 1,000 pages through Document AI.

model release

Poolside releases Laguna M.1: 225B parameter MoE model scores 74.6% on SWE-bench Verified

Poolside has released Laguna M.1, a 225B total parameter Mixture-of-Experts model with 23B activated parameters per token, designed for agentic coding tasks. The model scores 74.6% on SWE-bench Verified and 63.1% on SWE-bench Multilingual, released under Apache 2.0 license.

Comments

Loading...