model release

MiniMax Launches M3 Model With 1M Context Window at $0.30 Per Million Input Tokens

TL;DR

MiniMax has released M3, a multimodal foundation model supporting text, image, and video inputs with a 1-million-token context window. The model costs $0.30 per million input tokens and $1.20 per million output tokens, available through OpenRouter.

June 1, 2026 · 1:05 AM2 min read

MiniMax M3 — Quick Specs

Context window1000K tokens

Compare MiniMax M3 with other models →

MiniMax Launches M3 Model With 1M Context Window at $0.30 Per Million Input Tokens

MiniMax has released M3, a multimodal foundation model that processes text, image, and video inputs with a 1-million-token context window. The model costs $0.30 per million input tokens (50% launch discount) and $1.20 per million output tokens (50% launch discount), available through OpenRouter.

Technical Architecture

M3 is built on MiniMax Sparse Attention (MSA), which replaces full attention mechanisms with KV-block selection. According to MiniMax, this reduces per-token compute costs to approximately 1/20th of the previous generation at 1M tokens while maintaining quality across most tasks. The company claims substantially faster prefill and decode speeds compared to traditional full attention models.

The model was trained as a native multimodal system on interleaved data and fine-tuned using an interactive user-simulator framework designed for multi-turn, production-like collaboration.

Target Use Cases

MiniMax positions M3 for:

Long-horizon agentic workflows requiring sustained context
Coding tasks with large codebases
Tool use and function calling
Multi-step tasks requiring extended reasoning chains

The model outputs text only, despite accepting multimodal inputs.

Pricing Context

At current 50% discounted rates, M3's input pricing of $0.30 per million tokens undercuts most long-context competitors. For comparison:

Claude 3.5 Sonnet (200K context): $3.00 input / $15.00 output per million tokens
GPT-4 Turbo (128K context): $10.00 input / $30.00 output per million tokens
Gemini 1.5 Pro (2M context): $1.25 input / $5.00 output per million tokens (for prompts under 128K)

The model's full-price rates ($0.60 input / $2.40 output) would still position it as cost-competitive for extended context applications.

What This Means

M3 represents MiniMax's entry into the ultra-long-context market with a focus on cost efficiency through architectural innovation. The sparse attention approach directly addresses the computational bottleneck that has made million-token contexts prohibitively expensive for most applications. However, the model's quality at scale and real-world performance on complex multimodal tasks remains to be validated by independent benchmarks. The emphasis on multi-step agentic workflows suggests MiniMax is targeting enterprise automation and developer tooling markets rather than consumer chatbot applications.

Source: openrouter.ai ↗

MiniMax M3 multimodal sparse attention long context 1M tokens model release OpenRouter

model releaseJuly 14, 2026

Google releases Gemma 4 E2B, optimized to run natively on Pixel 10's Tensor G5 TPU

Google has released Gemma 4 E2B for TPU, a variant of its open-source Gemma 4 model optimized to run natively on the Tensor G5 chip in Pixel 10 devices. The multimodal model enables completely offline AI chat, image recognition, and audio transcription on Pixel 10, 10 Pro, 10 Pro XL, and 10 Pro Fold.

model releaseJuly 14, 2026

Kwaipilot Releases KAT-Coder-Air V2.5 with 256K Context Window at $0.15/$0.60 Per Million Tokens

Kwaipilot has released KAT-Coder-Air V2.5, a coding-specialized model with a 256K token context window. The model is priced at $0.15 per million input tokens and $0.60 per million output tokens, positioning it as a mid-tier coding assistant option.

model releaseJuly 14, 2026

Kwaipilot Releases KAT-Coder-Pro V2.5 with 256K Context Window at $0.74/$2.96 Per Million Tokens

Kwaipilot has released KAT-Coder-Pro V2.5, a coding-focused language model with a 256,000-token context window. The model is priced at $0.74 per million input tokens and $2.96 per million output tokens, available through OpenRouter.

model releaseJuly 13, 2026

OpenAI GPT-5.6 Sol, Terra, and Luna launch on Amazon Bedrock with 80-point Coding Agent Index score

OpenAI's GPT-5.6 model family is now generally available on Amazon Bedrock, introducing a three-tier system: Sol (flagship reasoning), Terra (balanced production), and Luna (fast inference). According to OpenAI, Sol scores 80 points on the Artificial Analysis Coding Agent Index and 73.5% on ExploitBench, establishing new benchmarks while using less than half the output tokens of competing models.

MiniMax Launches M3 Model With 1M Context Window at $0.30 Per Million Input Tokens

MiniMax M3 — Quick Specs

MiniMax Launches M3 Model With 1M Context Window at $0.30 Per Million Input Tokens

Technical Architecture

Target Use Cases

Pricing Context

What This Means

Related Articles

Google releases Gemma 4 E2B, optimized to run natively on Pixel 10's Tensor G5 TPU

Kwaipilot Releases KAT-Coder-Air V2.5 with 256K Context Window at $0.15/$0.60 Per Million Tokens

Kwaipilot Releases KAT-Coder-Pro V2.5 with 256K Context Window at $0.74/$2.96 Per Million Tokens

OpenAI GPT-5.6 Sol, Terra, and Luna launch on Amazon Bedrock with 80-point Coding Agent Index score

Comments