Baidu Releases Unlimited-OCR, a 3B Parameter Document Parsing Model Based on Deepseek-OCR
Baidu has released Unlimited-OCR, a 3 billion parameter model for optical character recognition and document parsing. The model supports single-page and multi-page document processing with a 32,768 token context window and runs on NVIDIA GPUs using bfloat16 precision.
Baidu Releases Unlimited-OCR, a 3B Parameter Document Parsing Model Based on Deepseek-OCR
Baidu has released Unlimited-OCR, a 3 billion parameter optical character recognition model designed for document parsing. Released on June 22, 2025, the model builds on Deepseek-OCR and is available on Hugging Face.
Technical Specifications
Unlimited-OCR operates with a 32,768 token context window and uses bfloat16 precision on NVIDIA GPUs. The model requires PyTorch 2.10.0, transformers 4.57.1, and CUDA 12.9. According to Baidu, the model is positioned as "pushing Deepseek-OCR one step further" with support for "one-shot long-horizon parsing."
The model offers two processing modes:
- Gundam mode: 1024 base size, 640 image size with crop mode enabled for single images
- Base mode: 1024 base and image size without cropping for single images and all multi-page documents
Deployment Options
Unlimited-OCR can be deployed via Hugging Face transformers or SGLang server infrastructure. The SGLang deployment requires FlashAttention 3 backend and supports an OpenAI-compatible API with streaming responses.
For multi-page documents and PDFs, the model converts pages to images at 300 DPI before processing. The implementation includes custom logit processors with a 35-token n-gram constraint and configurable window sizes (128 tokens for single images, 1,024 tokens for multi-page documents).
Technical Implementation
The model uses PyMuPDF for PDF-to-image conversion and supports both single-image and multi-page inference. Base64-encoded images are sent to the model with text prompts like "document parsing" or "Multi page parsing." The SGLang server configuration allocates 80% of GPU memory statically and disables overlap scheduling.
Baidu acknowledges Deepseek-OCR, Deepseek-OCR-2, and PaddleOCR in the model documentation. Pricing information has not been disclosed.
What This Means
Unlimited-OCR adds another option to the OCR model landscape, though it remains unclear how performance compares to existing solutions like GPT-4V or Claude 3.5 Sonnet on document understanding tasks. The 3B parameter size suggests efficient inference, but no benchmark scores have been published. The model's value proposition depends on comparative accuracy data that Baidu has not yet provided.
Related Articles
Poolside releases Laguna M.1: 225B parameter MoE model scores 74.6% on SWE-bench Verified
Poolside has released Laguna M.1, a 225B total parameter Mixture-of-Experts model with 23B activated parameters per token, designed for agentic coding tasks. The model scores 74.6% on SWE-bench Verified and 63.1% on SWE-bench Multilingual, released under Apache 2.0 license.
Mistral releases Leanstral, open-source 6B-parameter proof assistant for Lean 4 under Apache 2.0
Mistral AI has released Leanstral, a sparse 120B model with 6B active parameters designed specifically for the Lean 4 proof assistant. The model is available under Apache 2.0 license with free API access and achieves a 26.3 FLTEval score at pass@2, outperforming Claude Sonnet 4.6 while costing $36 versus $549.
Mistral OCR 3 launches at $2 per 1,000 pages with 74% win rate over previous version
Mistral AI released Mistral OCR 3, a document extraction model priced at $2 per 1,000 pages ($1 with Batch API discount). The model achieves a 74% overall win rate over its predecessor on forms, scanned documents, complex tables, and handwriting according to internal benchmarks.
Mistral Releases Mistral 3 Family: 675B-Parameter Large 3 MoE and Three Edge Models Under Apache 2.0
Mistral has released Mistral 3, including Mistral Large 3—a sparse mixture-of-experts model with 41B active and 675B total parameters—and three Ministral 3 edge models (3B, 8B, 14B). All models are released under Apache 2.0 license with multimodal capabilities and are available today on multiple platforms.
Comments
Loading...