model release

MiniMax M2.7 used autonomous loops to optimize its own training process

TL;DR

MiniMax released M2.7, a model that autonomously participated in its own development through self-optimization loops. The model ran over 100 optimization rounds on internal coding tasks, achieving a 30% performance boost, and scored 66.6% on OpenAI's MLE Bench Lite—competitive with Gemini 3.1 Pro and GPT-5.4.

3 min read
0

MiniMax M2.7 Used Autonomous Loops to Optimize Its Own Training Process

Chinese AI company MiniMax released M2.7, claiming the model actively participated in its own development through autonomous optimization cycles. Rather than relying solely on human-directed training, M2.7 reportedly updated its own knowledge stores, built agent infrastructure capabilities, and refined its reward-based training independently.

MiniMax describes M2.7 as "our first model deeply participating in its own evolution" and outlines a vision where future AI development will "gradually transition towards full autonomy, coordinating data construction, model training, inference architecture, evaluation, and other stages without human involvement."

Self-Improvement Through 100+ Optimization Rounds

To demonstrate autonomous self-improvement, MiniMax deployed an internal M2.7 agent that integrated with company research teams. The agent handled literature research, experiment tracking, debugging, metric analysis, and code fixes—covering 30 to 50 percent of the development workflow. Human researchers intervened only for critical decisions.

In one experiment, M2.7 autonomously optimized coding performance over more than 100 iterative rounds in an internal environment. Each cycle involved analyzing failures, planning code changes, testing results, and determining whether to retain or discard modifications. According to MiniMax, this process yielded a 30% performance improvement on internal evaluation sets.

Benchmark Performance: Competitive but Not Leading

MiniMax tested M2.7 against Claude Sonnet 4.6, Claude Opus 4.6, Gemini 3.1 Pro, and GPT-5.4 across eight benchmarks.

On OpenAI's MLE Bench Lite (22 machine learning competitions), M2.7 achieved an average medal rate of 66.6% across three 24-hour runs. This places it behind Opus 4.6 (75.7%) and GPT-5.4 (71.2%), but aligned with Gemini 3.1 Pro.

For software engineering, M2.7 scored 56.22% on SWE-Pro (comparable to GPT-5.3-Codex) and 55.6% on VIBE-Pro. MiniMax claims the model reduced production failure recovery time to under three minutes in multiple real-world scenarios.

On office productivity tasks, M2.7 achieved an ELO score of 1,495 on GDPval-AA, described as the highest among open-weight models. The company reports 97% rule fidelity across 40+ complex instruction sets for multi-level document edits.

Broader Context: Industry Trend

MiniMax is not alone in this approach. OpenAI recently introduced GPT-5.3 Codex with similar claims about AI-assisted development, where early model versions identified bugs during training and managed deployment evaluation. The theoretical foundation traces back to Jürgen Schmidhuber's 2003 "Gödel Machine" concept, with recent implementations like Sakana AI's Darwin-Gödel Machine and the Huxley-Gödel Machine from KAUST taking pragmatic evolutionary approaches.

Model Availability and Limitations

M2.7 is available via MiniMax Agent and API platforms. Unlike previous versions, model weights are not publicly released. MiniMax also launched OpenRoom, an open-source project featuring AI characters in graphical web environments with improved character consistency and emotional intelligence.

It's important to contextualize benchmark results: standardized test performance often diverges significantly from real-world capability, and scores depend heavily on testing conditions and prompt formatting. These numbers serve as reference points rather than definitive capability measures.

What This Means

MiniMax's M2.7 demonstrates that AI models can autonomously participate in their own optimization within controlled development environments. The competitive benchmark results—particularly matching Gemini 3.1 Pro despite behind Opus and GPT-5.4—show this self-improvement approach yields viable outputs. However, the distinction between internal optimization loops and genuine "self-evolution" remains significant. The model still operates within human-defined parameters and requires human oversight for critical decisions. This represents incremental progress on autonomous AI development rather than a fundamental shift in how models are built.

Related Articles

model release

Google releases Gemini 3.1 Flash Lite with 1M context at $0.25 per million input tokens

Google has released Gemini 3.1 Flash Lite, a high-efficiency multimodal model with a 1,048,576 token context window priced at $0.25 per million input tokens and $1.50 per million output tokens. The model supports text, image, video, audio, and PDF inputs with four thinking levels for cost-performance optimization.

model release

Mistral Releases Medium 3.5: 128B Dense Model With 256k Context and Configurable Reasoning

Mistral AI released Mistral Medium 3.5, a 128B parameter dense model with a 256k context window that unifies instruction-following, reasoning, and coding capabilities. The model features configurable reasoning effort per request and a vision encoder trained from scratch for variable image sizes.

model release

OpenAI releases GPT-Realtime-2 reasoning voice model with two specialized variants for translation and transcription

OpenAI has released three new realtime voice models through its Realtime API: GPT-Realtime-2 with GPT-5-class reasoning capabilities, GPT-Realtime-Translate supporting 70 input languages, and GPT-Realtime-Whisper for streaming transcription. The models are priced at $32-64 per 1M audio tokens for GPT-Realtime-2, and $0.017-0.034 per minute for the specialized variants.

model release

Anthropic's Mythos model finds thousands of high-severity bugs in Firefox, including 15-year-old vulnerabilities

Mozilla's Firefox team reports that Anthropic's Mythos model has discovered thousands of high-severity security vulnerabilities, including bugs that had remained undetected for more than 15 years. In April 2026, Firefox shipped 423 bug fixes compared to just 31 in April 2025, marking a 13x increase attributed to AI-assisted vulnerability detection.

Comments

Loading...