Baidu Launches CoBuddy Code Generation Model with 131K Context Window, Free on OpenRouter
Baidu has released CoBuddy, a code generation model optimized for coding tasks and AI agent workflows. The model features a 131K token context window, up to 65K output tokens, and runs on fp8 quantization with native support for tool calling and reasoning.
Baidu Qianfan: CoBuddy — Quick Specs
Baidu Launches CoBuddy Code Generation Model with 131K Context Window
Baidu has released CoBuddy, a code generation model from its Qianfan platform that is now available for free through OpenRouter. The model was released on May 6, 2026.
Technical Specifications
CoBuddy runs on fp8 quantization and supports a 131,072 token context window with up to 65,536 output tokens. According to Baidu, the model is optimized for high inference throughput and low end-to-end latency.
The model includes native support for tool calling and reasoning capabilities, positioning it for AI agent workflows. OpenRouter's implementation supports reasoning-enabled features that can show step-by-step thinking processes through a reasoning_details array in API responses.
Pricing and Availability
Baidu is offering CoBuddy at zero cost through OpenRouter, with $0 per million input tokens and $0 per million output tokens. The model is accessible through OpenRouter's unified API, which normalizes requests across multiple providers.
OpenRouter routes requests to available providers with automatic fallbacks for uptime optimization. The service supports both OpenAI SDK compatibility and OpenRouter's native SDK.
Target Use Cases
Baidu positions CoBuddy specifically for:
- Code generation tasks
- AI agent workflows requiring tool calling
- Applications needing reasoning transparency
- Development scenarios requiring large context windows
The fp8 quantization suggests Baidu is prioritizing inference efficiency, though the company has not disclosed benchmark scores or parameter count for the model.
What This Means
Baidu's decision to offer CoBuddy for free on OpenRouter represents a direct entry into the competitive code generation market dominated by models like Anthropic's Claude and OpenAI's GPT-4. The 131K context window and 65K output capacity are substantial, though not unprecedented—several recent models support similar or larger windows. The lack of disclosed benchmark scores makes it difficult to assess CoBuddy's actual capabilities relative to established code models. The free pricing suggests Baidu is prioritizing adoption and data collection over immediate monetization, a strategy common for new entrants seeking to establish market presence.
Related Articles
OpenRouter Launches Owl Alpha: Free Foundation Model for Agentic Workflows with 1M Context
OpenRouter has released Owl Alpha, a foundation model specifically designed for agentic workloads with native tool use support and a 1,048,756 token context window. The model is currently free for both input and output tokens and is compatible with Claude Code, OpenClaw, and other productivity tools.
NVIDIA releases Nemotron-3-Nano-Omni-30B, a 31B-parameter multimodal model with 256K context and reasoning mode
NVIDIA released Nemotron-3-Nano-Omni-30B-A3B, a multimodal large language model with 31 billion parameters that processes video, audio, images, and text with up to 256K token context. The model uses a Mamba2-Transformer hybrid Mixture of Experts architecture and supports chain-of-thought reasoning mode.
Mistral Releases Medium 3.5: 128B Dense Model With 256k Context and Configurable Reasoning
Mistral AI released Mistral Medium 3.5, a 128B parameter dense model with a 256k context window that unifies instruction-following, reasoning, and coding capabilities. The model features configurable reasoning effort per request and a vision encoder trained from scratch for variable image sizes.
NVIDIA Releases Nemotron 3 Nano Omni: 31B Multimodal Model With 256K Context and Reasoning Mode
NVIDIA released Nemotron 3 Nano Omni, a 31B parameter (30B active, 3B per token) multimodal model supporting video, audio, image, and text inputs. The model features a 256K token context window, reasoning mode with chain-of-thought, and tool calling capabilities.
Comments
Loading...