product updateMistral AI

Mistral Releases OCR API at $1 per 1,000 Pages, Claims 94.89% Accuracy on Document Benchmarks

TL;DR

Mistral AI has released an OCR API priced at $1 per 1,000 pages with batch inference costs approximately half that rate. The company claims 94.89% overall accuracy on internal benchmarks, ahead of GPT-4o (89.77%), Gemini 2.0 Flash (88.69%), and Azure OCR (89.52%). The model processes up to 2,000 pages per minute on a single node.

2 min read
0

Mistral Releases OCR API at $1 per 1,000 Pages, Claims 94.89% Accuracy on Document Benchmarks

Mistral AI has released an OCR (Optical Character Recognition) API priced at $1 per 1,000 pages, with approximately double the pages per dollar available through batch inference. The model is now live on Mistral's la Plateforme developer suite and deployed as the default document understanding model for Le Chat.

Performance Claims

According to Mistral, their OCR model achieves 94.89% overall accuracy on internal benchmarks, outperforming competing models:

  • Mistral OCR 2503: 94.89% overall
  • GPT-4o (2024-11-20): 89.77%
  • Azure OCR: 89.52%
  • Gemini 1.5 Pro 002: 89.92%
  • Gemini 2.0 Flash 001: 88.69%
  • Google Document AI: 83.42%

The company reports particularly strong performance on mathematical content (94.29%), tables (96.12%), and scanned documents (98.96%). However, these benchmarks are based on Mistral's internal "text-only" test set containing publication papers and web PDFs.

Technical Capabilities

Mistral OCR accepts images and PDFs as input and extracts content as ordered, interleaved text and images in markdown format. The model handles complex document elements including mathematical expressions in LaTeX, tables, multilingual text, and embedded imagery.

The system processes up to 2,000 pages per minute on a single node, making it what Mistral claims is the fastest in its category. The model supports thousands of scripts, fonts, and languages, with particularly high accuracy scores on European languages: German (99.51%), Spanish (99.54%), Italian (99.42%).

Multilingual Performance

Mistral reports 99.02% fuzzy match accuracy in multilingual generation, compared to Azure OCR (97.31%) and Gemini 2.0 Flash (96.53%). Language-specific scores include Russian (99.09%), French (99.20%), Hindi (97.55%), and Chinese (97.11%).

Features and Deployment

The API supports document-as-prompt functionality, allowing users to extract specific information and format outputs as structured JSON for downstream function calls and agent workflows. Unlike some competing models, Mistral OCR extracts embedded images alongside text.

Mistral will offer selective self-hosting options for organizations with data privacy requirements, keeping sensitive documents within customer infrastructure. The model will be available through cloud and inference partners, with on-premises deployment coming soon.

What This Means

Mistral's entry into OCR represents a significant price point at $1 per 1,000 pages, undercutting many existing document processing services. The claimed accuracy improvements—particularly the 5-6 percentage point lead over GPT-4o and Gemini on Mistral's internal benchmarks—would be substantial if validated on independent test sets. The extraction of embedded images alongside text differentiates it from pure text OCR systems, making it more suitable for RAG pipelines processing complex documents like scientific papers and technical manuals. The multilingual capabilities and speed (2,000 pages/minute) position it for high-volume enterprise document processing, though real-world performance will depend on document complexity and infrastructure.

Related Articles

model release

Mistral OCR 3 launches at $2 per 1,000 pages with 74% win rate over previous version

Mistral AI released OCR 3, a document parsing model priced at $2 per 1,000 pages with a 50% batch API discount. The company claims a 74% overall win rate compared to Mistral OCR 2 on forms, scanned documents, complex tables, and handwriting.

model release

Mistral AI Releases Voxtral: Apache 2.0 Speech Models with 32K Token Context at $0.001/Minute

Mistral AI released Voxtral, a family of open-source speech understanding models available in 24B and 3B parameter variants under Apache 2.0 license. The models support up to 32K token context (30 minutes of audio for transcription, 40 minutes for understanding) and are priced at $0.001 per minute via API—less than half the cost of comparable proprietary systems according to Mistral.

model release

Mistral Medium 3 launches at $0.4/$2 per million tokens, matching 90% of Claude 3.7 Sonnet performance

Mistral AI launched Mistral Medium 3 on May 7, 2025, priced at $0.4 per million input tokens and $2 per million output tokens. The company claims the model performs at or above 90% of Claude Sonnet 3.7 on benchmarks while being significantly less expensive, and surpasses Llama 4 Maverick and Cohere Command A.

product update

Mistral rebrands Le Chat to Vibe, launches autonomous coding agent and work automation platform

Mistral AI has rebranded Le Chat as Vibe, introducing two new agent modes: Work Mode for multi-step business tasks across connected apps, and Code Mode for autonomous coding from pull request to merge. The service includes a new VS Code extension and starts at $14.99/month for Pro tier.

Comments

Loading...