researchMistral AI

Mistral AI fine-tunes Pixtral-12B on satellite imagery, boosting classification accuracy from 56% to 91%

TL;DR

Mistral AI reports that fine-tuning its Pixtral-12B vision model on satellite imagery increased classification accuracy from 56% to 91% on the Aerial Image Dataset. The company used LoRA (Low-Rank Adaptation) to train on 8,000 samples for under $10, reducing hallucinations from 5% to 0.1%.

2 min read
0

Mistral AI fine-tunes Pixtral-12B on satellite imagery, boosting classification accuracy from 56% to 91%

Mistral AI reports that fine-tuning its Pixtral-12B vision model on satellite imagery increased classification accuracy from 56% to 91% on the Aerial Image Dataset, demonstrating how domain-specific adaptation can dramatically improve model performance on specialized tasks.

The experiment

Mistral used the publicly available Aerial Image Dataset (AID), splitting it into 8,000 training samples and 2,000 test samples across 30 scene categories. The categories included challenging distinctions like dense residential versus medium residential areas, and ambiguous terms like "center."

The base Pixtral-12B model, using only prompt engineering with a system prompt listing all target classes, achieved 56% accuracy. The model also hallucinated invalid class names 5% of the time, producing labels not in the original set.

Fine-tuning approach

Mistral applied Low-Rank Adaptation (LoRA), which injects small trainable matrices into the model's weights rather than retraining the entire model. According to Mistral, the fine-tuning required no extensive hyperparameter tuning and cost under $10 to complete.

The company used its fine-tuning API, training the model by providing correct labels for system prompts and input images. Mistral recommends starting with a single epoch and monitoring for overfitting risk.

Results

After fine-tuning on the 8,000 training samples:

  • Overall accuracy: 91% (up from 56%)
  • Hallucination rate: 0.1% (down from 5%)
  • Performance became more consistent across all 30 classes

Mistral highlighted a specific example where the base model confused "Playground" and "Stadium" categories, both classified incorrectly as "Stadium." The fine-tuned model learned to distinguish them based on features like the presence of seating surrounding sports fields.

Technical details

Mistral's LoRA implementation allows developers to adapt models without modifying full model weights. The technique proves particularly useful when prompt engineering produces inconsistent results or when dealing with complex, domain-specific visual patterns.

The company offers two fine-tuning options: direct API calls for granular hyperparameter control, or the LaPlateforme UI, which automatically computes optimal batch size based on dataset size.

What this means

This research demonstrates that vision-language models can achieve significant performance gains on specialized visual domains through relatively inexpensive fine-tuning. The 1.6x accuracy improvement on satellite imagery with just 8,000 samples suggests similar approaches could work for other underrepresented visual domains in VLM training sets, such as medical imaging, surveillance analysis, or manuscript transcription.

The under-$10 cost and single-epoch training make this approach accessible for organizations with limited budgets working on specialized computer vision tasks. Mistral has published a cookbook with full implementation details at github.com/mistralai/cookbook.

Related Articles

product update

Mistral AI Expands Into Industrial Engineering With Airbus, BMW Partnerships and Acquires Physics AI Firm Emmi

Mistral AI announced a new industrial engineering AI stack combining physics models with partnerships across aerospace, automotive, and semiconductor sectors. The company acquired scientific AI firm Emmi on May 22, 2026, and is opening a 10 MW inference data center in Les Ulis, France in Q3 2026.

model release

Mistral Releases Medium 3.5: 128B Model with Cloud Coding Agents and 77.6% SWE-Bench Verified

Mistral AI released Medium 3.5, a 128B dense model with a 256k context window that scores 77.6% on SWE-Bench Verified. The model powers new remote coding agents in Mistral Vibe that run asynchronously in the cloud, plus a new Work mode in Le Chat for multi-step agentic tasks.

product update

Mistral AI Releases MCP Connectors in Studio with Direct Tool Calling and Human-in-the-Loop Workflows

Mistral AI has released Connectors in Studio, allowing developers to integrate custom MCP (Model Context Protocol) servers alongside built-in connectors for enterprise AI applications. The release includes direct tool calling, human-in-the-loop approval flows, and programmatic connector management via API and SDK.

product update

Mistral AI launches Forge, enterprise platform for training custom models on proprietary data

Mistral AI has launched Forge, a platform for enterprises to train custom AI models on proprietary data including codebases, compliance policies, and operational records. Early partners include ASML, DSO National Laboratories Singapore, Ericsson, European Space Agency, and HTX Singapore.

Comments

Loading...