MiniMax M2.7 used autonomous loops to optimize its own training process
MiniMax released M2.7, a model that autonomously participated in its own development through self-optimization loops. The model ran over 100 optimization rounds on internal coding tasks, achieving a 30% performance boost, and scored 66.6% on OpenAI's MLE Bench Lite—competitive with Gemini 3.1 Pro and GPT-5.4.
MiniMax M2.7 Used Autonomous Loops to Optimize Its Own Training Process
Chinese AI company MiniMax released M2.7, claiming the model actively participated in its own development through autonomous optimization cycles. Rather than relying solely on human-directed training, M2.7 reportedly updated its own knowledge stores, built agent infrastructure capabilities, and refined its reward-based training independently.
MiniMax describes M2.7 as "our first model deeply participating in its own evolution" and outlines a vision where future AI development will "gradually transition towards full autonomy, coordinating data construction, model training, inference architecture, evaluation, and other stages without human involvement."
Self-Improvement Through 100+ Optimization Rounds
To demonstrate autonomous self-improvement, MiniMax deployed an internal M2.7 agent that integrated with company research teams. The agent handled literature research, experiment tracking, debugging, metric analysis, and code fixes—covering 30 to 50 percent of the development workflow. Human researchers intervened only for critical decisions.
In one experiment, M2.7 autonomously optimized coding performance over more than 100 iterative rounds in an internal environment. Each cycle involved analyzing failures, planning code changes, testing results, and determining whether to retain or discard modifications. According to MiniMax, this process yielded a 30% performance improvement on internal evaluation sets.
Benchmark Performance: Competitive but Not Leading
MiniMax tested M2.7 against Claude Sonnet 4.6, Claude Opus 4.6, Gemini 3.1 Pro, and GPT-5.4 across eight benchmarks.
On OpenAI's MLE Bench Lite (22 machine learning competitions), M2.7 achieved an average medal rate of 66.6% across three 24-hour runs. This places it behind Opus 4.6 (75.7%) and GPT-5.4 (71.2%), but aligned with Gemini 3.1 Pro.
For software engineering, M2.7 scored 56.22% on SWE-Pro (comparable to GPT-5.3-Codex) and 55.6% on VIBE-Pro. MiniMax claims the model reduced production failure recovery time to under three minutes in multiple real-world scenarios.
On office productivity tasks, M2.7 achieved an ELO score of 1,495 on GDPval-AA, described as the highest among open-weight models. The company reports 97% rule fidelity across 40+ complex instruction sets for multi-level document edits.
Broader Context: Industry Trend
MiniMax is not alone in this approach. OpenAI recently introduced GPT-5.3 Codex with similar claims about AI-assisted development, where early model versions identified bugs during training and managed deployment evaluation. The theoretical foundation traces back to Jürgen Schmidhuber's 2003 "Gödel Machine" concept, with recent implementations like Sakana AI's Darwin-Gödel Machine and the Huxley-Gödel Machine from KAUST taking pragmatic evolutionary approaches.
Model Availability and Limitations
M2.7 is available via MiniMax Agent and API platforms. Unlike previous versions, model weights are not publicly released. MiniMax also launched OpenRoom, an open-source project featuring AI characters in graphical web environments with improved character consistency and emotional intelligence.
It's important to contextualize benchmark results: standardized test performance often diverges significantly from real-world capability, and scores depend heavily on testing conditions and prompt formatting. These numbers serve as reference points rather than definitive capability measures.
What This Means
MiniMax's M2.7 demonstrates that AI models can autonomously participate in their own optimization within controlled development environments. The competitive benchmark results—particularly matching Gemini 3.1 Pro despite behind Opus and GPT-5.4—show this self-improvement approach yields viable outputs. However, the distinction between internal optimization loops and genuine "self-evolution" remains significant. The model still operates within human-defined parameters and requires human oversight for critical decisions. This represents incremental progress on autonomous AI development rather than a fundamental shift in how models are built.
Related Articles
OpenAI releases GPT-5.4 mini and nano with 3-4x price increases but major performance gains
OpenAI has released GPT-5.4 mini and GPT-5.4 nano, compact models optimized for coding and subagent tasks. The new models deliver significant performance improvements—GPT-5.4 mini reaches 54.4% on SWE-Bench Pro versus 45.7% for GPT-5 mini—but cost 3-4x more per input token than their predecessors.
NVIDIA releases Nemotron-3-Nano-4B, a 4B parameter model for edge AI with 262K context window
NVIDIA released Nemotron-3-Nano-4B-GGUF on March 16, 2026, a 4-billion parameter small language model (SLM) designed for edge deployment on devices like Jetson Thor and GeForce RTX. The model features a hybrid Mamba-2 and Transformer architecture with a 262K token context window and supports both reasoning and non-reasoning modes via system prompts.
Nvidia releases Nemotron 3 Super: 120B MoE model with 1M token context
Nvidia has released Nemotron 3 Super, a 120-billion parameter hybrid Mamba-Transformer Mixture-of-Experts model that activates only 12 billion parameters during inference. The open-weight model features a 1-million token context window, multi-token prediction capabilities, and pricing at $0.10 per million input tokens and $0.50 per million output tokens.
Xiaomi releases MiMo-V2-Pro with 1M context window and 1T+ parameters
Xiaomi released MiMo-V2-Pro on March 18, 2026, a flagship foundation model with over 1 trillion total parameters and a 1,048,576 token context window. The model is priced at $1 per million input tokens and $3 per million output tokens, positioning it as an agent-focused system comparable to top-tier models.