OCR

6 articles tagged with OCR

June 23, 2026

Mistral OCR 4 Launches With Bounding Boxes, 170 Language Support at $2-4 Per 1,000 Pages

Mistral AI released OCR 4, a compact document extraction model that returns bounding boxes, block classification, and inline confidence scores alongside text. The model supports 170 languages, scores 85.20 on OlmOCRBench, and is priced at $4 per 1,000 pages via API ($2 with batch discount) or $5 per 1,000 pages through Document AI.

June 23, 2026 · 2:06 PM

May 27, 2026

product updateAmazon Web Services

AWS launches Amazon Bedrock Data Automation for financial document processing with custom blueprint system

Amazon Web Services released Amazon Bedrock Data Automation (BDA), a foundation model-powered service designed to extract and validate structured data from financial documents. The service uses custom blueprints to process bank statements, W-2 tax forms, 1099-B forms, and vendor contracts, offering what AWS claims is industry-leading accuracy at lower cost than using foundation models directly.

May 27, 2026 · 9:35 PM

May 14, 2026

model release

Baidu Releases Qianfan-OCR-Fast Model with 66K Context at $0.68 Per 1M Input Tokens

Baidu has released Qianfan-OCR-Fast, a multimodal model specialized for optical character recognition tasks. The model offers a 66,000 token context window and is priced at $0.68 per 1M input tokens and $2.81 per 1M output tokens.

May 14, 2026 · 2:20 PM

May 12, 2026

model release

Perceptron Launches Mk1 Vision-Language Model with Video Reasoning at $0.15/$1.50 per 1M Tokens

Perceptron has released Perceptron Mk1, a vision-language model designed for video understanding and embodied reasoning tasks. The model accepts image and video inputs with 33K context window, priced at $0.15 per 1M input tokens and $1.50 per 1M output tokens, and supports structured spatial annotations on demand.

May 12, 2026 · 3:35 PM

May 2, 2026

model releaseNVIDIA

NVIDIA releases Nemotron-3-Nano-Omni-30B, a 31B-parameter multimodal model with 256K context and reasoning mode

NVIDIA released Nemotron-3-Nano-Omni-30B-A3B, a multimodal large language model with 31 billion parameters that processes video, audio, images, and text with up to 256K token context. The model uses a Mamba2-Transformer hybrid Mixture of Experts architecture and supports chain-of-thought reasoning mode.

May 2, 2026 · 9:06 PM

April 23, 2026

model release

Baidu Releases Free Qianfan-OCR-Fast Model with 65K Context Window

Baidu has released Qianfan-OCR-Fast, a specialized OCR model with a 65,536 token context window, available at zero cost through OpenRouter. The model launched on April 20, 2026, and is positioned as a performance upgrade over the original Qianfan-OCR.

April 23, 2026 · 2:20 AM

← Back to all news