Mistral OCR 4 Launches With Bounding Boxes, 170 Language Support at $2-4 Per 1,000 Pages
Mistral AI released OCR 4, a compact document extraction model that returns bounding boxes, block classification, and inline confidence scores alongside text. The model supports 170 languages, scores 85.20 on OlmOCRBench, and is priced at $4 per 1,000 pages via API ($2 with batch discount) or $5 per 1,000 pages through Document AI.
Mistral OCR 4 Launches With Bounding Boxes, 170 Language Support at $2-4 Per 1,000 Pages
Mistral AI released OCR 4, a document extraction model that adds bounding boxes, block classification, and inline confidence scores to extracted text. The model runs in a single container for self-hosted deployment and supports 170 languages across 10 language groups.
Pricing and Deployment
OCR 4 is priced at $4 per 1,000 pages via API, with a 50% batch discount reducing the cost to $2 per 1,000 pages. The Document AI interface in Mistral Studio costs $5 per 1,000 pages. The model is compact enough to deploy on a single container for organizations with data sovereignty requirements.
Performance Claims
Mistral claims OCR 4 achieves an 85.20 score on OlmOCRBench, the top result among models tested by the company. The model also scores 93.07 on OmniDocBench and 0.98 on Mistral's internal Crawl Multilingual evaluation.
In human preference evaluations, independent annotators preferred OCR 4 over competing systems with an average win rate of 72%, according to Mistral. The company tested OCR 4 against 600+ documents across 12+ languages in blind comparisons.
Mistral notes significant limitations in automated benchmark scoring, including ground-truth errors in reference data, equivalent LaTeX notation scored as mismatches, and multi-column reading order artifacts. The company recommends evaluating the model on your own documents rather than relying solely on benchmark scores.
Technical Capabilities
OCR 4 returns structured document representations with:
- Bounding boxes for text localization and in-context highlighting
- Block classification identifying titles, tables, equations, signatures, and other document elements
- Inline confidence scores per-page and per-word for verification workflows
- Format support for PDF, DOC, PPT, and OpenDocument files
- 170 languages across English, Western Europe, Eastern Europe, Middle Eastern, Chinese, East Asian, Southeast Asian, and rare language groups
The model integrates with Mistral Search Toolkit, an open-source search framework announced at the AI Now Summit, providing structured inputs for RAG and enterprise search pipelines.
Use Cases and Performance Data
Aidan Donohue, AI Engineer at Rogo, reported OCR 4 matched the accuracy of leading agentic document parsers on financial QA datasets at roughly 8x lower cost and 17x lower latency. Ivan Mihailov, AI engineer at Anaqua, stated the model is roughly 4x faster per page than their previous provider for high-volume docketing workflows.
What This Means
OCR 4 addresses a critical gap in document processing by combining text extraction with spatial and structural metadata. Bounding boxes enable citation-grounded outputs and data pipeline validation, while block classification supports semantic chunking for retrieval systems. The $2-4 per 1,000 page pricing undercuts many enterprise document services, though organizations should verify performance on their specific document types given the benchmark limitations Mistral acknowledges. The single-container deployment option makes this model accessible to organizations that cannot send documents to external APIs for compliance or sovereignty reasons.
Related Articles
Mistral OCR 3 launches at $2 per 1,000 pages with 74% win rate over previous version
Mistral AI released Mistral OCR 3, a document extraction model priced at $2 per 1,000 pages ($1 with Batch API discount). The model achieves a 74% overall win rate over its predecessor on forms, scanned documents, complex tables, and handwriting according to internal benchmarks.
Baidu Releases Unlimited-OCR, a 3B Parameter Document Parsing Model Based on Deepseek-OCR
Baidu has released Unlimited-OCR, a 3 billion parameter model for optical character recognition and document parsing. The model supports single-page and multi-page document processing with a 32,768 token context window and runs on NVIDIA GPUs using bfloat16 precision.
Mistral AI launches Connectors in Studio with MCP protocol integration and direct tool calling
Mistral AI has released Connectors in Studio, allowing developers to integrate custom MCP (Model Context Protocol) servers and built-in connectors via API/SDK. The release includes direct tool calling for deterministic workflows and human-in-the-loop approval flows for sensitive operations.
Mistral launches Workflows orchestration layer for production AI with Temporal-based execution engine
Mistral AI released Workflows in public preview, an orchestration layer built on Temporal's execution engine for running AI processes in production. The platform provides durable execution with state tracking, full observability through Studio, and single-line human-in-the-loop approval pauses. Organizations including ASML, ABANCA, and CMA-CGM are already using it to automate critical business processes.
Comments
Loading...