Mistral's Leanstral code verification agent outperforms Claude Sonnet at 15% of the cost
Mistral has released Leanstral, a 120B-parameter code verification agent built with the Lean programming language, claiming it outperforms larger open-source models and offers significant cost advantages over Anthropic's Claude suite. The model achieves a pass@2 score of 26.3—beating Claude Sonnet by 2.6 points—while costing $36 to run compared to Sonnet's $549.
Mistral's Leanstral Code Verification Agent Outperforms Claude Sonnet at 15% of the Cost
Mistral has released Leanstral, a 120-billion-parameter coding agent designed for formal code verification using the open-source Lean programming language. The release includes open weights under Apache 2.0 license, integration within Mistral Vibe, and a free API endpoint.
Performance Claims vs. Claude
According to Mistral's internal FLTEval benchmark—a new evaluation framework for engineering proofs that remains unreleased—Leanstral-120B-A6B significantly undercuts Anthropic's pricing while claiming competitive performance.
On pass@2 scoring: Leanstral reaches 26.3, exceeding Claude Sonnet's 23.7 by 2.6 points, while costing $36 versus Sonnet's $549. At pass@16 scoring: Leanstral achieves 31.9, beating Sonnet by 8 points, at $290 versus Sonnet's cost.
Anthropic's Claude Opus 4.6 still scores higher at 39.6 on pass@16, though it costs $1,650 compared to Leanstral's $290—representing a 5.7x price premium for 7.7 additional points.
Mistral claims Leanstral outperforms several larger open-source competitors including GLM5-744B-A40B, Kimi-K2.5-1T-32B, and Qwen3.5-397B-A17B on FLTEval, despite having substantially fewer parameters than these models.
Formal Verification as a Solution
The core appeal of Leanstral addresses a fundamental limitation of AI code generation: the inability to reliably verify correctness without human review. By leveraging formal proof systems, Mistral argues that specifications, proofs, tests, and linting can ground AI agents in verifiable correctness, reducing the time-consuming need for human code review.
Mistral demonstrated this by deploying Leanstral against a real question from the Proof Assistant Stack Exchange involving a bug in Lean 4 code. According to the company, Leanstral successfully generated test code to reproduce the failure and correctly identified and fixed the underlying flaw.
Broader Product Releases
Mistral simultaneously released Mistral Small 4, positioned as a unified model handling reasoning, coding, and instruction-following tasks without requiring users to switch between specialized models.
Critical Considerations
FLTEval has not been publicly released, making independent verification of these benchmarks impossible at present. Comparisons rest entirely on Mistral's claims. The FLTEval framework specifically targets formal proof engineering—a specialized domain not representative of general code generation tasks where Claude excels. Pricing comparisons use Mistral's stated cost per pass attempt; actual deployment costs depend on context window usage, which is not disclosed.
Cost-per-token pricing for Leanstral is not specified in Mistral's announcement, preventing direct technical cost comparison.
What This Means
Mistral is positioning Leanstral as a specialized alternative for formal code verification and proof engineering—a narrower but increasingly important use case as organizations prioritize code correctness. The cost structure targets teams running multiple inference passes for verification purposes, where 15-20% of Claude's price becomes meaningful. However, the advantage is limited to formal verification workflows; general-purpose coding likely remains Claude's domain until independent FLTEval results emerge.
Related Articles
Meta acquires Moltbook, hires AI agent platform founders for Superintelligence Labs
Meta has acquired Moltbook, a social network designed exclusively for AI agents, and hired its founders Matt Schlicht and Ben Parr to work in Meta's Superintelligence Labs run by former Scale AI CEO Alexandr Wang. The acquisition gives Meta access to Moltbook's technology for verifying agent identities and coordinating complex tasks between AI bots. The move signals Meta's intent to integrate agentic AI capabilities into its platforms, though specific plans remain undisclosed.
Meta acquires Moltbook, social network for AI agents, hires founders into Superintelligence Labs
Meta has acquired Moltbook, a social network designed for AI agents, bringing founders Matt Schlicht and Ben Parr into Meta Superintelligence Labs under former Scale AI CEO Alexandr Wang. The move positions Meta alongside OpenAI's OpenClaw in acquiring AI agent platforms.