Mistral releases Leanstral, open-source 6B-parameter proof assistant for Lean 4 under Apache 2.0

TL;DR

Mistral AI has released Leanstral, a sparse 120B model with 6B active parameters designed specifically for the Lean 4 proof assistant. The model is available under Apache 2.0 license with free API access and achieves a 26.3 FLTEval score at pass@2, outperforming Claude Sonnet 4.6 while costing $36 versus $549.

June 18, 2026 · 9:06 AM3 min read

Leanstral 120B-A6B — Quick Specs

Compare Leanstral 120B-A6B with other models →

Mistral releases Leanstral, open-source 6B-parameter proof assistant for Lean 4 under Apache 2.0

Mistral AI has released Leanstral, a sparse architecture model with 6B active parameters trained specifically for the Lean 4 proof assistant. The company positioned it as the "first open-source code agent designed for Lean 4," targeting formal verification of mathematics and software rather than general code generation.

Model specifications

Architecture: 120B total parameters with 6B active (sparse)
License: Apache 2.0
API: Free endpoint (labs-leanstral-2603) during initial period
Integration: Built into Mistral Vibe with /leanstall command
Training focus: Proof engineering in Lean 4 repositories

Benchmark performance

Mistral introduced FLTEval, a new benchmark based on completing formal proofs in pull requests to the Fermat's Last Theorem (FLT) project, rather than isolated math problems.

According to Mistral, Leanstral achieved:

21.9 at pass@1 ($18 cost)
26.3 at pass@2 ($36 cost)
31.9 at pass@16 ($290 cost)

Compared results:

Claude Sonnet 4.6: 23.7 ($549)
Claude Haiku 4.5: 23.0 ($184)
Claude Opus 4.6: 39.6 ($1,650)
Qwen3.5-397B-A17B: 25.4 at pass@4
GLM5-744B-A40B: 16.6
Kimi-K2.5-1T-A32B: 20.1

Mistral claims Leanstral at pass@2 beats Sonnet by 2.6 points while costing 93% less, and at pass@16 beats Sonnet by 8 points. Claude Opus 4.6 remains ahead but costs 92x more than Leanstral at similar pass rates.

Technical capabilities

Leanstral is trained to:

Complete formal proofs in realistic repository contexts
Define new mathematical concepts with correct syntax
Translate code between proof assistants (demonstrated with Rocq to Lean 4 conversion)
Debug proof failures in new Lean versions (tested with Lean 4.29.0-rc6, which Mistral states was not in training data)

The model supports Model Context Protocol (MCP) integration and was specifically optimized for lean-lsp-mcp. Mistral used parallel inference with Lean's verifier to validate outputs.

Case study details

Mistral provided two demonstrations:

Version migration debugging: Given a Stack Exchange question about code breaking in Lean 4.29.0-rc6, Leanstral diagnosed that def creates rigid definitions blocking the rw tactic, and correctly recommended switching to abbrev for transparent aliasing.
Cross-assistant translation: Successfully converted program semantics definitions from Rocq (from Princeton CS441 course materials) to Lean 4, including custom notation, and proved properties about the translated code.

What this means

Leanstral targets a narrow but technically demanding niche: formal verification in Lean 4. The sparse architecture approach (6B active from 120B total parameters) appears designed to reduce inference costs while maintaining specialized performance.

The Apache 2.0 license and free API access lower barriers for formal methods researchers and projects using Lean 4. However, the model's utility depends entirely on adoption within the Lean ecosystem—a small community compared to general programming.

Mistral's cost comparisons assume pass@N sampling strategies, which require multiple API calls. The $36 at pass@2 versus Sonnet's $549 calculation reflects running the model twice versus once, making direct cost efficiency claims dependent on whether the sampling strategy is necessary for a given task.

The FLTEval benchmark based on real repository PRs is a more realistic evaluation than competition math problems, but as a new benchmark created by Mistral, independent validation of results is not yet available.

Source: mistral.ai ↗

mistral-ai leanstral lean-4 formal-verification proof-assistant open-source benchmark sparse-architecture

model releaseJuly 29, 2026

OpenAI's GPT Transcribe Cuts Word Error Rate to 3.31% but Trails ElevenLabs, Google, and Mistral

OpenAI released GPT Transcribe and GPT Live Transcribe, improving word error rate to 3.31 percent and cutting prices 25 percent to $0.0045 per minute. Independent benchmarks still place OpenAI behind ElevenLabs, Google, and Mistral on transcription accuracy.

model releaseJuly 28, 2026

Microsoft Releases VibeVoice-ASR-BitNet: 1.58GB Speech Recognition Model Runs Real-Time on CPU, No GPU Needed

Microsoft Research released VibeVoice-ASR-BitNet, a quantized 1.58GB version of its VibeVoice-ASR speech recognition model that achieves real-time inference (RTF < 1) on as few as 3 CPU threads. The model runs 1.6-2.3x faster than Whisper.cpp on commodity x86 and ARM hardware, with a modest accuracy tradeoff.

model releaseAugust 2, 2026

Anthropic's Claude Opus 5 Generates Full 3D Games From a Single Text Prompt, No Assets Required

Anthropic's Claude Opus 5 can generate playable 3D games, including first-person shooters and Minecraft clones, from a single text prompt with zero external assets. Community tests claim it outperforms GPT-5.6 Sol and Kimi K3 in physics realism and mechanical complexity, though no standardized benchmark has confirmed the comparisons.

model releaseAugust 1, 2026

ByteDance's Seedance 2.5 Generates 30-Second AI Video Clips With Synced Audio

ByteDance released Seedance 2.5, an AI video model that generates synchronized video and audio in a single pass, producing clips up to 30 seconds long that can be extended further. That's roughly triple the length of Google's Gemini Omni Flash.

Mistral releases Leanstral, open-source 6B-parameter proof assistant for Lean 4 under Apache 2.0

Leanstral 120B-A6B — Quick Specs

Mistral releases Leanstral, open-source 6B-parameter proof assistant for Lean 4 under Apache 2.0

Model specifications

Benchmark performance

Technical capabilities

Case study details

What this means

Related Articles

OpenAI's GPT Transcribe Cuts Word Error Rate to 3.31% but Trails ElevenLabs, Google, and Mistral

Microsoft Releases VibeVoice-ASR-BitNet: 1.58GB Speech Recognition Model Runs Real-Time on CPU, No GPU Needed

Anthropic's Claude Opus 5 Generates Full 3D Games From a Single Text Prompt, No Assets Required

ByteDance's Seedance 2.5 Generates 30-Second AI Video Clips With Synced Audio

Comments