model releaseMistral AI

Mistral releases Leanstral, open-source 6B-parameter proof assistant for Lean 4 under Apache 2.0

TL;DR

Mistral AI has released Leanstral, a sparse 120B model with 6B active parameters designed specifically for the Lean 4 proof assistant. The model is available under Apache 2.0 license with free API access and achieves a 26.3 FLTEval score at pass@2, outperforming Claude Sonnet 4.6 while costing $36 versus $549.

3 min read
0

Mistral releases Leanstral, open-source 6B-parameter proof assistant for Lean 4 under Apache 2.0

Mistral AI has released Leanstral, a sparse architecture model with 6B active parameters trained specifically for the Lean 4 proof assistant. The company positioned it as the "first open-source code agent designed for Lean 4," targeting formal verification of mathematics and software rather than general code generation.

Model specifications

  • Architecture: 120B total parameters with 6B active (sparse)
  • License: Apache 2.0
  • API: Free endpoint (labs-leanstral-2603) during initial period
  • Integration: Built into Mistral Vibe with /leanstall command
  • Training focus: Proof engineering in Lean 4 repositories

Benchmark performance

Mistral introduced FLTEval, a new benchmark based on completing formal proofs in pull requests to the Fermat's Last Theorem (FLT) project, rather than isolated math problems.

According to Mistral, Leanstral achieved:

  • 21.9 at pass@1 ($18 cost)
  • 26.3 at pass@2 ($36 cost)
  • 31.9 at pass@16 ($290 cost)

Compared results:

  • Claude Sonnet 4.6: 23.7 ($549)
  • Claude Haiku 4.5: 23.0 ($184)
  • Claude Opus 4.6: 39.6 ($1,650)
  • Qwen3.5-397B-A17B: 25.4 at pass@4
  • GLM5-744B-A40B: 16.6
  • Kimi-K2.5-1T-A32B: 20.1

Mistral claims Leanstral at pass@2 beats Sonnet by 2.6 points while costing 93% less, and at pass@16 beats Sonnet by 8 points. Claude Opus 4.6 remains ahead but costs 92x more than Leanstral at similar pass rates.

Technical capabilities

Leanstral is trained to:

  • Complete formal proofs in realistic repository contexts
  • Define new mathematical concepts with correct syntax
  • Translate code between proof assistants (demonstrated with Rocq to Lean 4 conversion)
  • Debug proof failures in new Lean versions (tested with Lean 4.29.0-rc6, which Mistral states was not in training data)

The model supports Model Context Protocol (MCP) integration and was specifically optimized for lean-lsp-mcp. Mistral used parallel inference with Lean's verifier to validate outputs.

Case study details

Mistral provided two demonstrations:

  1. Version migration debugging: Given a Stack Exchange question about code breaking in Lean 4.29.0-rc6, Leanstral diagnosed that def creates rigid definitions blocking the rw tactic, and correctly recommended switching to abbrev for transparent aliasing.

  2. Cross-assistant translation: Successfully converted program semantics definitions from Rocq (from Princeton CS441 course materials) to Lean 4, including custom notation, and proved properties about the translated code.

What this means

Leanstral targets a narrow but technically demanding niche: formal verification in Lean 4. The sparse architecture approach (6B active from 120B total parameters) appears designed to reduce inference costs while maintaining specialized performance.

The Apache 2.0 license and free API access lower barriers for formal methods researchers and projects using Lean 4. However, the model's utility depends entirely on adoption within the Lean ecosystem—a small community compared to general programming.

Mistral's cost comparisons assume pass@N sampling strategies, which require multiple API calls. The $36 at pass@2 versus Sonnet's $549 calculation reflects running the model twice versus once, making direct cost efficiency claims dependent on whether the sampling strategy is necessary for a given task.

The FLTEval benchmark based on real repository PRs is a more realistic evaluation than competition math problems, but as a new benchmark created by Mistral, independent validation of results is not yet available.

Related Articles

model release

Mistral Releases Mistral 3 Family: 675B-Parameter Large 3 MoE and Three Edge Models Under Apache 2.0

Mistral has released Mistral 3, including Mistral Large 3—a sparse mixture-of-experts model with 41B active and 675B total parameters—and three Ministral 3 edge models (3B, 8B, 14B). All models are released under Apache 2.0 license with multimodal capabilities and are available today on multiple platforms.

product update

Mistral AI launches Connectors in Studio with MCP protocol integration and direct tool calling

Mistral AI has released Connectors in Studio, allowing developers to integrate custom MCP (Model Context Protocol) servers and built-in connectors via API/SDK. The release includes direct tool calling for deterministic workflows and human-in-the-loop approval flows for sensitive operations.

model release

Mistral Releases Voxtral TTS: 4B Parameter Text-to-Speech Model at $0.016 per 1k Characters

Mistral AI has released Voxtral TTS, a 4B parameter text-to-speech model supporting 9 languages including English, French, German, Spanish, Dutch, Portuguese, Italian, Hindi, and Arabic. The model achieves 70ms latency for typical inputs and can clone voices from as little as 3 seconds of audio, priced at $0.016 per 1,000 characters.

product update

Mistral AI Launches Forge for Enterprise Model Training on Proprietary Data

Mistral AI has launched Forge, a platform that allows enterprises to train custom AI models on their proprietary data including codebases, compliance policies, and operational documentation. The system supports both dense and mixture-of-experts architectures with pre-training, post-training, and reinforcement learning capabilities.

Comments

Loading...