DeepSeek Releases V4 Pro: 1.6T Parameter MoE Model with 1M Token Context at $1.74/M Input Tokens
DeepSeek has released V4 Pro, a Mixture-of-Experts model with 1.6 trillion total parameters and 49 billion activated parameters. The model supports a 1-million-token context window and costs $1.74 per million input tokens and $3.48 per million output tokens.
DeepSeek V4 Pro — Quick Specs
DeepSeek Releases V4 Pro: 1.6T Parameter MoE Model with 1M Token Context
DeepSeek has released V4 Pro, a large-scale Mixture-of-Experts model with 1.6 trillion total parameters and 49 billion activated parameters, supporting a 1-million-token context window. The model is priced at $1.74 per million input tokens and $3.48 per million output tokens.
Architecture and Capabilities
According to DeepSeek, V4 Pro is built on the same architecture as DeepSeek V4 Flash and introduces a hybrid attention system designed for efficient long-context processing. The model supports multiple reasoning modes that allow users to balance speed and depth depending on task requirements.
The company claims the model delivers strong performance across knowledge, mathematics, and software engineering benchmarks, though specific benchmark scores have not been disclosed.
Target Use Cases
DeepSeek positions V4 Pro for complex workloads including:
- Full-codebase analysis
- Multi-step automation
- Large-scale information synthesis
- Advanced reasoning tasks
- Long-horizon agent workflows
The 1-million-token context window enables processing of entire codebases or lengthy documents in a single inference call.
Pricing and Availability
V4 Pro is available through OpenRouter as of April 24, 2026. At $1.74 per million input tokens, it sits in the mid-range pricing tier for frontier models. The 2:1 output-to-input pricing ratio ($3.48 vs $1.74) is standard for models with generation-heavy workloads.
Technical Details
The Mixture-of-Experts architecture activates 49 billion parameters per forward pass while maintaining 1.6 trillion total parameters. This approach aims to provide capabilities comparable to dense models of similar active parameter count while reducing computational costs.
OpenRouter integration includes support for DeepSeek's reasoning modes, with developers able to access step-by-step thinking processes through the reasoning_details array in API responses.
What This Means
V4 Pro represents DeepSeek's entry into the ultra-long-context market dominated by models like Anthropic's Claude and Google's Gemini. The 1M token context window and competitive pricing make it viable for enterprise use cases requiring analysis of large documents or codebases. However, without published benchmark scores, direct performance comparisons to established models remain unclear. The MoE architecture suggests DeepSeek is prioritizing inference efficiency alongside capability, a trend across Chinese AI labs competing with Western frontier model providers.
Related Articles
DeepSeek V4 Flash Released: 284B Parameter MoE Model with 1M Context Window at $0.14 per Million Tokens
DeepSeek has released V4 Flash, a Mixture-of-Experts model with 284B total parameters and 13B activated parameters per request. The model supports a 1,048,576-token context window and is priced at $0.14 per million input tokens and $0.28 per million output tokens.
Tencent Releases Hy3 Preview MoE Model with 262K Context and Three Reasoning Modes
Tencent has released Hy3 Preview, a Mixture-of-Experts model offering 262,144 token context window and three configurable reasoning modes (disabled, low, high) for production agentic workflows. The model is available for free through OpenRouter.
DeepSeek Releases V4-Flash: 284B-Parameter MoE Model With 1M Token Context at 27% Inference Cost
DeepSeek released two Mixture-of-Experts models: V4-Flash with 284B total parameters (13B activated) and V4-Pro with 1.6T parameters (49B activated). Both models support one million token context windows and use a hybrid attention architecture that requires only 27% of the inference FLOPs compared to DeepSeek-V3.2 at 1M token context.
DeepSeek Releases V4-Pro: 1.6T Parameter MoE Model with 1M Token Context
DeepSeek released two new Mixture-of-Experts models: DeepSeek-V4-Pro with 1.6 trillion parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated), both supporting one million token context length. The models achieve 27% of inference FLOPs and 10% of KV cache compared to DeepSeek-V3.2 at 1M context through a hybrid attention architecture combining Compressed Sparse Attention and Heavily Compressed Attention.
Comments
Loading...