Mistral releases Leanstral, 6B-parameter open-source model for Lean 4 formal proof verification

TL;DR

Mistral AI released Leanstral, the first open-source code agent designed specifically for Lean 4 formal proof verification. The model uses 6B active parameters in a sparse 120B architecture and is available under Apache 2.0 license with free API access.

May 28, 2026 · 10:06 AM3 min read

Leanstral 120B-A6B — Quick Specs

Compare Leanstral 120B-A6B with other models →

Mistral releases Leanstral, 6B-parameter open-source model for Lean 4 formal proof verification

Mistral AI released Leanstral, the first open-source code agent designed specifically for Lean 4 formal proof verification. The model uses 6B active parameters in a sparse 120B total parameter architecture and is available under Apache 2.0 license with free API access.

Lean 4 is a proof assistant capable of expressing complex mathematical objects and software specifications. According to Mistral, Leanstral is trained for operating in realistic formal repositories rather than isolated mathematical problems.

Architecture and availability

Leanstral uses a highly sparse architecture with 120B total parameters and 6B active parameters. The model is available through three channels:

Open-source weights under Apache 2.0 license
Integration in Mistral Vibe (activated with /leanstall command)
Free API endpoint labs-leanstral-2603

Mistral states the free API endpoint will remain "highly accessible for a limited period" to gather feedback data. The company will release a technical report detailing training approaches and FLTEval, a new evaluation suite.

Benchmark performance

Mistral evaluated Leanstral on FLTEval, which tests completing formal proofs and defining mathematical concepts in pull requests to the Fermat's Last Theorem (FLT) project. The benchmark compares against Claude Opus 4.6, Sonnet 4.6, Haiku 4.5, and open-source models including Qwen3.5 397B-A17B, Kimi-K2.5 1T-A32B, and GLM5 744B-A40B.

According to Mistral:

Leanstral single pass: 21.9 score, $18 cost
Leanstral pass@2: 26.3 score, $36 cost (beats Sonnet 4.6's 23.7 at $549)
Leanstral pass@16: 31.9 score, $290 cost
Claude Opus 4.6: 39.6 score, $1,650 cost
Qwen3.5 397B-A17B: 25.4 score at pass@4
GLM5 744B-A40B: 16.6 score cap
Kimi-K2.5 1T-A32B: 20.1 score cap

Mistral used Mistral Vibe as the evaluation scaffold with no benchmark-specific modifications.

Demonstrated capabilities

Mistral provided two case studies. In the first, Leanstral diagnosed a breaking change in Lean 4.29.0-rc6 involving the rw tactic failing with type aliases. The model identified that def creates rigid definitions requiring explicit unfolding, blocking the tactic, and correctly proposed switching to abbrev for transparent aliases.

In the second case study, Leanstral converted program definitions from Rocq (from a Princeton CS441 course) to Lean 4, implementing custom notation and proving properties about programs in the language.

Integration features

Leanstral supports Model Context Protocol (MCP) through Mistral Vibe and was trained for optimal performance with lean-lsp-mcp. Users can access the model in Vibe by pressing Shift+Tab to cycle to Leanstral or using vibe --agent lean command.

What this means

Leanstral represents a significant efficiency advance in formal verification tooling. A 6B active parameter model matching or exceeding models with 17B-40B active parameters on formal proof tasks suggests architectural optimizations specific to proof verification may be more impactful than raw parameter count. The Apache 2.0 license and free API access lower barriers to formal verification adoption, though the model still trails Claude Opus 4.6 by 7.7 points at comparable cost levels. The shift from isolated competition math problems to full repository PR completion in FLTEval provides a more realistic benchmark for production formal verification workflows.

Source: mistral.ai ↗

mistral-ai leanstral formal-verification lean4 open-source code-generation proof-assistant apache-2.0

model releaseJuly 11, 2026

Cohere releases 2B parameter Arabic speech recognition model with 25.9% average WER

Cohere and Cohere Labs released Cohere Transcribe Arabic, a 2B parameter automatic speech recognition model optimized for Arabic dialects and Arabic-English code-switching. The open-source model achieves a 25.9% average word error rate across major Arabic ASR benchmarks, outperforming models up to 30B parameters.

model releaseJuly 10, 2026

Meta stock surges 15% as company releases Muse Spark 1.1 agentic model and Muse Image generator

Meta's stock surged 15% this week following the release of two AI models: Muse Spark 1.1 for agentic and coding workloads on Thursday, and Muse Image for image generation on Tuesday. The releases come three months after Meta introduced its first foundation model, Muse Spark, as the company competes with OpenAI, Anthropic, and Google.

model releaseJuly 10, 2026

OpenAI releases GPT-5.6 in three versions as COO Fidji Simo departs after 11 months

OpenAI released GPT-5.6 Thursday in three versions—Luna, Terra, and Sol—with Sol claiming benchmark wins over Anthropic's Claude Fable on coding tasks. The launch coincides with COO Fidji Simo's departure less than a year after joining, citing worsening health issues.

model releaseJuly 9, 2026

OpenAI releases GPT-5.6 with three model variants, claims 80-point Coding Agent Index score for Sol

OpenAI released GPT-5.6 in three variants: Sol ($5 input/$30 output per 1M tokens), Terra ($2.50/$15), and Luna ($1/$6). According to OpenAI, Sol achieves an 80-point score on the Artificial Analysis Coding Agent Index, 2.8 points above Anthropic's Fable 5, while using less than half the output tokens and costing one-third less.

Mistral releases Leanstral, 6B-parameter open-source model for Lean 4 formal proof verification

Leanstral 120B-A6B — Quick Specs

Mistral releases Leanstral, 6B-parameter open-source model for Lean 4 formal proof verification

Architecture and availability

Benchmark performance

Demonstrated capabilities

Integration features

What this means

Related Articles

Cohere releases 2B parameter Arabic speech recognition model with 25.9% average WER

Meta stock surges 15% as company releases Muse Spark 1.1 agentic model and Muse Image generator

OpenAI releases GPT-5.6 in three versions as COO Fidji Simo departs after 11 months

OpenAI releases GPT-5.6 with three model variants, claims 80-point Coding Agent Index score for Sol

Comments