Mistral Launches OCR API at $1 Per 1,000 Pages, Claims 94.89% Accuracy on Document Benchmarks
Mistral AI has released Mistral OCR, an API for extracting text and images from documents at $1 per 1,000 pages (approximately $0.50 with batch inference). The company claims 94.89% overall accuracy on its internal test set, comparing favorably to GPT-4o (89.77%), Gemini 2.0 Flash (88.69%), and Azure OCR (89.52%).
Mistral Launches OCR API at $1 Per 1,000 Pages, Claims 94.89% Accuracy on Document Benchmarks
Mistral AI has released Mistral OCR, an API for extracting text and images from documents at $1 per 1,000 pages (approximately $0.50 with batch inference). The company claims 94.89% overall accuracy on its internal test set, comparing favorably to GPT-4o (89.77%), Gemini 2.0 Flash (88.69%), and Azure OCR (89.52%).
The API accepts images and PDFs as input and outputs interleaved text and images in markdown format. Mistral has deployed the model as the default document understanding system on Le Chat, its chatbot platform.
Performance Claims
According to Mistral, the model achieved the following scores on its internal "text-only" test set:
- Overall accuracy: 94.89% (vs GPT-4o's 89.77%)
- Math extraction: 94.29% (vs GPT-4o's 87.55%)
- Scanned documents: 98.96% (vs GPT-4o's 94.58%)
- Tables: 96.12% (vs GPT-4o's 91.70%)
- Multilingual: 89.55% (vs GPT-4o's 86.00%)
The company claims processing speeds of up to 2,000 pages per minute on a single node. Mistral states it extracts embedded images from documents alongside text, a capability not present in the compared models.
Multilingual Support
Mistral claims 99.02% fuzzy match accuracy across multiple languages on its benchmarks, compared to 96.53% for Gemini 2.0 Flash and 97.31% for Azure OCR. The company reports accuracy above 97% for 11 tested languages, including Russian (99.09%), German (99.51%), Spanish (99.54%), Chinese (97.11%), and Hindi (97.55%).
Technical Capabilities
The model handles:
- Mathematical expressions and LaTeX formatting
- Complex tables and interleaved imagery
- Documents as prompts with structured JSON output
- Multiple scripts and fonts across languages
Mistral positions the API for use in RAG (Retrieval-Augmented Generation) systems processing multimodal documents like slides and complex PDFs. Users can chain extracted outputs into downstream function calls for agent-based workflows.
Availability and Deployment
The API is available today on la Plateforme, Mistral's developer platform. The company plans to extend availability to cloud and inference partners, plus on-premises deployment on a selective basis for organizations handling classified information.
Mistral has not disclosed the model's parameter count, architecture details, or training data composition.
What This Means
Mistral OCR enters a competitive market dominated by Google Document AI, Azure OCR, and general-purpose multimodal models like GPT-4o and Gemini. The pricing of $1 per 1,000 pages undercuts typical enterprise OCR pricing, though direct cost comparisons depend on specific use cases and batch processing capabilities. The claimed accuracy advantages—particularly on mathematical content (94.29% vs GPT-4o's 87.55%)—could make it viable for scientific and technical document processing if the benchmarks prove reproducible on external test sets. The key differentiation appears to be simultaneous text and image extraction in a single pass, which existing general-purpose LLMs don't natively support.
Related Articles
Mistral OCR 3 launches at $2 per 1,000 pages with 74% win rate over previous version
Mistral AI released Mistral OCR 3, a document extraction model priced at $2 per 1,000 pages ($1 with Batch API discount). The model achieves a 74% overall win rate over its predecessor on forms, scanned documents, complex tables, and handwriting according to internal benchmarks.
Mistral AI launches Connectors in Studio with MCP protocol integration and direct tool calling
Mistral AI has released Connectors in Studio, allowing developers to integrate custom MCP (Model Context Protocol) servers and built-in connectors via API/SDK. The release includes direct tool calling for deterministic workflows and human-in-the-loop approval flows for sensitive operations.
Mistral Releases Mistral 3 Family: 675B-Parameter Large 3 MoE and Three Edge Models Under Apache 2.0
Mistral has released Mistral 3, including Mistral Large 3—a sparse mixture-of-experts model with 41B active and 675B total parameters—and three Ministral 3 edge models (3B, 8B, 14B). All models are released under Apache 2.0 license with multimodal capabilities and are available today on multiple platforms.
Mistral AI adds Deep Research agent, voice mode with Voxtral model to Le Chat
Mistral AI has released a major update to Le Chat, adding a Deep Research agent that generates structured research reports, a new voice input model called Voxtral, and Projects for organizing conversations. The update also includes multilingual reasoning powered by Mistral's Magistral model.
Comments
Loading...