Xiaomi Launches MiMo-V2.5 With 1M Context Window at $0.40 per Million Input Tokens
Xiaomi released MiMo-V2.5 on April 22, 2026, a native omnimodal model with a 1,048,576 token context window. The model is priced at $0.40 per million input tokens and $2 per million output tokens, positioning it as a cost-efficient alternative for agentic applications requiring multimodal perception across image and video understanding.
Xiaomi Launches MiMo-V2.5 With 1M Context Window at $0.40 per Million Input Tokens
Xiaomi released MiMo-V2.5 on April 22, 2026, a native omnimodal model featuring a 1,048,576 token context window priced at $0.40 per million input tokens and $2 per million output tokens.
Specifications and Pricing
MiMo-V2.5 offers:
- Context window: 1,048,576 tokens (1M)
- Input pricing: $0.40 per million tokens
- Output pricing: $2 per million tokens
- Release date: April 22, 2026
According to Xiaomi, the model delivers "Pro-level agentic performance at roughly half the inference cost" compared to unspecified alternatives, though the company has not provided independent benchmark scores to verify these claims.
Technical Capabilities
Xiaomi describes MiMo-V2.5 as a "native omnimodal model" designed for multimodal perception across image and video understanding tasks. The company claims the model surpasses its predecessor, MiMo-V2-Omni, in multimodal perception, though specific benchmark comparisons were not disclosed.
The 1M context window is designed to handle complete documents, extended conversations, and complex task contexts in a single inference pass. Xiaomi positions this capability as particularly suited for integration with agent frameworks.
Availability
The model is currently available through OpenRouter, which routes requests across multiple providers to optimize uptime and handle varying prompt sizes. OpenRouter supports the model's reasoning capabilities through a dedicated reasoning parameter that exposes step-by-step thinking processes via a reasoning_details array in API responses.
What This Means
MiMo-V2.5 enters an increasingly competitive omnimodal model market with a clear value proposition: extended context at lower input pricing than many enterprise-tier alternatives. At $0.40 per million input tokens, it undercuts several comparable models while offering a 1M context window—a specification typically reserved for premium tiers.
The focus on agentic workflows suggests Xiaomi is targeting developers building autonomous systems that require sustained reasoning across multimodal inputs. However, without published benchmark scores on standard evaluation sets like MMLU, VQAv2, or video understanding benchmarks, independent assessment of the model's claimed performance advantages remains difficult. The model's effectiveness will ultimately be determined by real-world deployment results in production agent systems.
Related Articles
Alibaba's Qwen Releases Qwen3.7 Plus: 1M Context Window at $0.40 Per Million Input Tokens
Alibaba's Qwen has released Qwen3.7 Plus, a multimodal model with a 1 million token context window. The model accepts text and image input with text output, priced at $0.40 per million input tokens and $1.60 per million output tokens through OpenRouter's API.
Ideogram 4: 9.3B parameter open-weight text-to-image model with native 2K resolution and structured JSON prompting
Ideogram has released Ideogram 4, its first open-weight text-to-image model with 9.3 billion parameters. The model supports native 2K resolution, structured JSON prompting with bounding-box layout controls, and is available in nf4 and fp8 quantizations under a non-commercial license.
Microsoft releases MAI-Thinking-1, its first reasoning AI model trained without third-party distillation
Microsoft announced MAI-Thinking-1, its first advanced reasoning AI model, at Build 2026. The company claims it's a medium-sized model matching leading models on key software engineering benchmarks, trained from scratch without distillation from third-party models.
NVIDIA Releases Nemotron-3-Ultra: 550B Parameter Model with 1M Token Context and Configurable Reasoning
NVIDIA released Nemotron-3-Ultra-550B-A55B-NVFP4, a 550B parameter model with 55B active parameters, featuring a 1M token context window and configurable reasoning mode. The model uses a hybrid LatentMoE architecture combining Mamba-2, Mixture-of-Experts, and Attention layers with Multi-Token Prediction, trained with NVIDIA's NVFP4 quantization-aware approach.
Comments
Loading...