researchMistral AI

Mistral AI fine-tunes Pixtral-12B on satellite imagery, boosting classification accuracy from 56% to 91%

TL;DR

Mistral AI has published research showing that fine-tuning its Pixtral-12B vision language model on satellite imagery increases classification accuracy from 56% to 91% on the Aerial Image Dataset. Using Low-Rank Adaptation (LoRA) with 8,000 training samples across 30 scene categories, the company reduced hallucinations from 5% to 0.1% for under $10 in compute costs.

2 min read
0

Mistral AI fine-tunes Pixtral-12B on satellite imagery, boosting classification accuracy from 56% to 91%

Mistral AI has published research demonstrating that fine-tuning its Pixtral-12B vision language model on satellite imagery produces a 1.6x improvement in classification performance. The base model achieved 56% accuracy on the Aerial Image Dataset (AID), while the fine-tuned version reached 91% accuracy.

Technical approach: LoRA fine-tuning

The company used Low-Rank Adaptation (LoRA), a technique that injects small trainable matrices into model weights rather than retraining the entire model. According to Mistral AI, this approach required 8,000 training samples distributed across 30 scene categories from the Aerial Image Dataset, introduced by Xia et al under a Public Domain license.

The fine-tuning job cost under $10 to run, making it accessible for specialized domain adaptation. Mistral AI reports that hallucinations—cases where the model generated invalid class names not in the target set—dropped from 5% to 0.1% after fine-tuning.

Dataset and classification challenges

The Aerial Image Dataset contains satellite imagery classified into detailed scene categories including Desert, BareLand, RailwayStation, Mountain, and 26 other classes. Many categories proved difficult for the base model to distinguish, particularly visually similar classes like "Dense Residential" vs. "Medium Residential" or ambiguous scenes labeled "Center."

Mistral AI's example highlights the model's improved ability to differentiate between "Playground" and "Stadium"—the base model classified both as "Stadium," while the fine-tuned version correctly identified the distinction based on the presence of surrounding seats.

Implementation details

The research used a train/test split of 8,000 and 2,000 samples respectively. According to Mistral AI, minimal hyperparameter tuning was required. The company recommends:

  • Starting with small learning rates to avoid overshooting optimal weights
  • Beginning with a single training epoch and monitoring for overfitting
  • Using batch sizes that fit computational resources while maintaining stable gradients

Fine-tuning can be executed via Mistral's API or through the La Plateforme UI. The API provides direct control over hyperparameters, while La Plateforme automatically computes optimal batch size based on dataset size.

What this means

This research validates that domain-specific fine-tuning of general-purpose vision language models can achieve significant performance gains on specialized imagery tasks. The sub-$10 cost and 8,000-sample requirement makes this approach viable for organizations with proprietary satellite data.

The technique extends beyond satellite imagery to other underrepresented visual domains in standard VLM training sets, including medical image captioning, surveillance footage analysis, and ancient manuscript transcription. Mistral AI has published the implementation in a Jupyter notebook at github.com/mistralai/cookbook.

The results suggest that for tasks requiring nuanced visual distinctions in specialized domains, fine-tuning substantially outperforms prompt engineering approaches, which Mistral AI notes can produce inconsistent results on complex classification tasks.

Related Articles

product update

Mistral AI launches Connectors in Studio with MCP protocol integration and direct tool calling

Mistral AI has released Connectors in Studio, allowing developers to integrate custom MCP (Model Context Protocol) servers and built-in connectors via API/SDK. The release includes direct tool calling for deterministic workflows and human-in-the-loop approval flows for sensitive operations.

model release

Mistral Releases Voxtral TTS: 4B Parameter Text-to-Speech Model at $0.016 per 1k Characters

Mistral AI has released Voxtral TTS, a 4B parameter text-to-speech model supporting 9 languages including English, French, German, Spanish, Dutch, Portuguese, Italian, Hindi, and Arabic. The model achieves 70ms latency for typical inputs and can clone voices from as little as 3 seconds of audio, priced at $0.016 per 1,000 characters.

product update

Mistral AI Launches Forge for Enterprise Model Training on Proprietary Data

Mistral AI has launched Forge, a platform that allows enterprises to train custom AI models on their proprietary data including codebases, compliance policies, and operational documentation. The system supports both dense and mixture-of-experts architectures with pre-training, post-training, and reinforcement learning capabilities.

model release

Mistral releases Leanstral, open-source 6B-parameter proof assistant for Lean 4 under Apache 2.0

Mistral AI has released Leanstral, a sparse 120B model with 6B active parameters designed specifically for the Lean 4 proof assistant. The model is available under Apache 2.0 license with free API access and achieves a 26.3 FLTEval score at pass@2, outperforming Claude Sonnet 4.6 while costing $36 versus $549.

Comments

Loading...