MiniMax releases M2.7, a 229B parameter model with self-evolving capabilities and agent teams
MiniMax has released MiniMax-M2.7, a 229-billion parameter model that uniquely participates in its own evolution during development. The model achieves 66.6% medal rate on MLE Bench Lite and 56.22% on SWE-Pro benchmarks, with native support for multi-agent collaboration and complex tool orchestration.
MiniMax Releases M2.7: Open-Source 229B Model With Self-Evolution Capability
MiniMax has released MiniMax-M2.7, a 229-billion parameter open-source model that introduces a self-evolution cycle, allowing the model to autonomously improve its own learning process during development.
Model Self-Evolution
M2.7's defining feature is its participation in its own development. During training, the model autonomously updated its memory, built dozens of complex skills for reinforcement learning experiments, and improved its learning process based on experimental results. In one internal benchmark, an M2.7 instance optimized a programming scaffold over 100+ iterations—analyzing failure trajectories, modifying code, running evaluations, and deciding whether to keep or revert changes—achieving a 30% performance improvement.
Software Engineering and System-Level Reasoning
M2.7 demonstrates strong capabilities in production engineering tasks spanning log analysis, bug troubleshooting, refactoring, and security analysis. The model shows system-level reasoning across monitoring metrics, trace analysis, and root-cause verification. MiniMax reports reducing live production incident recovery time to under three minutes on multiple occasions using M2.7.
On software engineering benchmarks:
- SWE-Pro: 56.22% (matching GPT-5.3-Codex)
- SWE Multilingual: 76.5%
- Multi SWE Bench: 52.7%
- Terminal Bench 2: 57.0%
- NL2Repo: 39.8%
- VIBE-Pro: 55.6% (near Opus 4.6 parity)
On machine learning competitions, M2.7 achieved 66.6% medal rate on MLE Bench Lite (22 competitions), second only to Opus-4.6 and GPT-5.4 according to MiniMax's claims.
Multi-Agent Capabilities and Professional Work
M2.7 natively supports Agent Teams for multi-agent collaboration with stable role identity and autonomous decision-making. The model demonstrates capability in document editing (Word, Excel, PowerPoint) with high-fidelity multi-round editing and produces editable deliverables.
On professional work benchmarks:
- GDPval-AA: 1495 ELO score (highest among open-source models)
- Toolathon: 46.3% accuracy
- MM Claw end-to-end: 62.7% (close to Sonnet 4.6)
- MM Claw skill compliance: 97% across 40+ complex skills
Deployment and Access
MiniMax-M2.7 is available as open-source weights on Hugging Face (229B parameters, supporting F32, BF16, and F8_E4M3 tensor formats). The model can be deployed via SGLang, vLLM, Transformers, or NVIDIA NIM endpoints.
MiniMax also provides API access through MiniMax Agent (agent.minimax.io) and the MiniMax API platform (platform.minimax.io). The company offers an interactive demo called OpenRoom (openroom.ai) featuring real-time visual feedback and scene interactions.
Recommended inference parameters: temperature=1.0, top_p=0.95, top_k=40.
What This Means
M2.7 represents a shift toward models that can improve themselves during development—a capability that could accelerate iteration cycles for specialized tasks. The strong SWE and system engineering benchmarks position it competitively against closed-source reasoning models on production engineering work. The open-source release makes these capabilities accessible for self-hosted deployments, though claims about self-evolution achieving 30% improvements and MLE Bench performance warrant independent verification. The multi-agent framework and tool compliance metrics suggest MiniMax is targeting enterprise automation workflows alongside development tooling.
Related Articles
Zhipu AI's GLM-5.1 outperforms GPT-5.4 and Claude Opus 4.6 on SWE-Bench Pro through iterative strategy refinement
Zhipu AI has released GLM-5.1, a freely available open-weight model designed for long-running programming tasks that achieves 58.4% on SWE-Bench Pro, edging out GPT-5.4 (57.7%) and Claude Opus 4.6 (57.3%). The model's core capability is iterative strategy refinement—it rethinks its approach across hundreds of iterations and thousands of tool calls, recognizing dead ends and shifting tactics without human intervention. However, GLM-5.1 trails on reasoning and knowledge benchmarks, scoring 31% on Humanity's Last Exam compared to Gemini 3.1 Pro's 45%.
Google releases Gemma 4, open-source on-device AI with agentic tool use for phones
Google released Gemma 4, an open-source multimodal model that runs entirely on smartphones without sending data to the cloud. The E2B and E4B variants require just 6GB and 8GB of RAM respectively and can autonomously use tools like Wikipedia, maps, and QR code generators through built-in agent skills. The model is available free via the Google AI Edge Gallery app for Android and iOS.
Liquid AI releases LFM2.5-VL-450M, improved 450M-parameter vision-language model with multilingual support
Liquid AI has released LFM2.5-VL-450M, a refreshed 450M-parameter vision-language model built on an updated LFM2.5-350M backbone. The model features a 32,768-token context window, supports 9 languages, handles native 512×512 pixel images, and adds bounding box prediction and function calling capabilities. Performance improvements span both vision and language benchmarks compared to its predecessor.
Tencent releases HY-Embodied-0.5, a 2B-parameter vision-language model for robot control
Tencent has released HY-Embodied-0.5, a family of foundation models designed specifically for embodied AI and robotic control. The suite includes a 2B-parameter MoT (Mixture-of-Transformers) variant with only 2.2B activated parameters during inference, and a 32B model that claims frontier-level performance comparable to Gemini 3.0 Pro, trained on over 200 billion tokens of embodied-specific data.
Comments
Loading...