Baidu Releases Qianfan-OCR-Fast Model with 66K Context at $0.68 Per 1M Input Tokens
Baidu has released Qianfan-OCR-Fast, a multimodal model specialized for optical character recognition tasks. The model offers a 66,000 token context window and is priced at $0.68 per 1M input tokens and $2.81 per 1M output tokens.
Qianfan-OCR-Fast — Quick Specs
Baidu Releases Qianfan-OCR-Fast Model with 66K Context at $0.68 Per 1M Input Tokens
Baidu has released Qianfan-OCR-Fast, a multimodal model purpose-built for optical character recognition, with a 66,000 token context window and pricing of $0.68 per 1M input tokens.
Specifications
The model is available through OpenRouter with the following specifications:
- Context window: 66,000 tokens
- Input pricing: $0.68 per 1M tokens
- Output pricing: $2.81 per 1M tokens
- Model type: Multimodal (specialized for OCR)
- Release date: Listed as April 20, 2026 (likely an error; actual release date unclear)
Technical Details
According to Baidu, Qianfan-OCR-Fast was trained on specialized OCR data while maintaining broader multimodal capabilities. The company claims it provides improved performance over its predecessor, Qianfan-OCR, though specific benchmark comparisons were not provided.
The model is designed to handle document understanding, text extraction, and related OCR tasks while retaining general multimodal intelligence for image understanding beyond pure text recognition.
Availability
Qianfan-OCR-Fast is currently available through OpenRouter's API routing service, which automatically selects providers based on prompt requirements and maintains fallback options for uptime. Weekly token usage on the platform stands at 273,000 tokens as of the listing date.
No information about direct API access through Baidu's own infrastructure was disclosed in the announcement.
What This Means
Baidu's OCR-specialized model enters a growing market for document understanding AI, competing with models from OpenAI (GPT-4 Vision), Anthropic (Claude 3), and Google (Gemini). The 66K context window is sufficient for processing lengthy documents in a single request, though it falls short of competitors offering 200K+ contexts. At $0.68 per 1M input tokens, pricing is competitive for specialized OCR tasks, particularly for high-volume document processing workflows where domain-specific optimization may justify the cost over general-purpose vision models.
Related Articles
Perceptron Launches Mk1 Vision-Language Model with Video Reasoning at $0.15/$1.50 per 1M Tokens
Perceptron has released Perceptron Mk1, a vision-language model designed for video understanding and embodied reasoning tasks. The model accepts image and video inputs with 33K context window, priced at $0.15 per 1M input tokens and $1.50 per 1M output tokens, and supports structured spatial annotations on demand.
Google releases Gemini 3.1 Flash Lite with 1M context at $0.25 per million input tokens
Google has released Gemini 3.1 Flash Lite, a high-efficiency multimodal model with a 1,048,576 token context window priced at $0.25 per million input tokens and $1.50 per million output tokens. The model supports text, image, video, audio, and PDF inputs with four thinking levels for cost-performance optimization.
Baidu Launches CoBuddy Code Generation Model with 131K Context Window, Free on OpenRouter
Baidu has released CoBuddy, a code generation model optimized for coding tasks and AI agent workflows. The model features a 131K token context window, up to 65K output tokens, and runs on fp8 quantization with native support for tool calling and reasoning.
DeepSeek Releases V4 Flash: 284B-Parameter MoE Model with 1M Context Window, Free via OpenRouter
DeepSeek has released V4 Flash, a Mixture-of-Experts model with 284B total parameters and 13B activated parameters per forward pass. The model supports a 1M-token context window and is available free through OpenRouter, targeting high-throughput coding and chat applications.
Comments
Loading...