DeepSeek Releases V4 Pro: 1.6T Parameter MoE Model with 1M Token Context at $1.74/M Input Tokens
DeepSeek has released V4 Pro, a Mixture-of-Experts model with 1.6 trillion total parameters and 49 billion activated parameters. The model supports a 1-million-token context window and costs $1.74 per million input tokens and $3.48 per million output tokens.
DeepSeek V4 Pro — Quick Specs
DeepSeek Releases V4 Pro: 1.6T Parameter MoE Model with 1M Token Context
DeepSeek has released V4 Pro, a large-scale Mixture-of-Experts model with 1.6 trillion total parameters and 49 billion activated parameters, supporting a 1-million-token context window. The model is priced at $1.74 per million input tokens and $3.48 per million output tokens.
Architecture and Capabilities
According to DeepSeek, V4 Pro is built on the same architecture as DeepSeek V4 Flash and introduces a hybrid attention system designed for efficient long-context processing. The model supports multiple reasoning modes that allow users to balance speed and depth depending on task requirements.
The company claims the model delivers strong performance across knowledge, mathematics, and software engineering benchmarks, though specific benchmark scores have not been disclosed.
Target Use Cases
DeepSeek positions V4 Pro for complex workloads including:
- Full-codebase analysis
- Multi-step automation
- Large-scale information synthesis
- Advanced reasoning tasks
- Long-horizon agent workflows
The 1-million-token context window enables processing of entire codebases or lengthy documents in a single inference call.
Pricing and Availability
V4 Pro is available through OpenRouter as of April 24, 2026. At $1.74 per million input tokens, it sits in the mid-range pricing tier for frontier models. The 2:1 output-to-input pricing ratio ($3.48 vs $1.74) is standard for models with generation-heavy workloads.
Technical Details
The Mixture-of-Experts architecture activates 49 billion parameters per forward pass while maintaining 1.6 trillion total parameters. This approach aims to provide capabilities comparable to dense models of similar active parameter count while reducing computational costs.
OpenRouter integration includes support for DeepSeek's reasoning modes, with developers able to access step-by-step thinking processes through the reasoning_details array in API responses.
What This Means
V4 Pro represents DeepSeek's entry into the ultra-long-context market dominated by models like Anthropic's Claude and Google's Gemini. The 1M token context window and competitive pricing make it viable for enterprise use cases requiring analysis of large documents or codebases. However, without published benchmark scores, direct performance comparisons to established models remain unclear. The MoE architecture suggests DeepSeek is prioritizing inference efficiency alongside capability, a trend across Chinese AI labs competing with Western frontier model providers.
Related Articles
Nvidia releases Nemotron 3 Ultra: 550B-parameter MoE model with 1M context window for agentic workflows
Nvidia has released Nemotron 3 Ultra, a 550-billion parameter mixture-of-experts model with 55 billion active parameters and support for up to 1 million token context windows. The model uses a hybrid Transformer-Mamba architecture and is designed specifically for long-running agentic workflows including agent orchestration, coding agents, and complex enterprise tasks.
Nvidia Releases Nemotron 3 Ultra: 550B Parameter MoE Model with 1M Token Context Window
Nvidia has released Nemotron 3 Ultra, a 550B parameter mixture-of-experts model with 55B active parameters and a 1M token context window. The model uses a hybrid Transformer-Mamba architecture and is available for free through OpenRouter, targeting agentic workflows and multi-step reasoning tasks.
NVIDIA releases Nemotron-3-Ultra: 550B parameter model with 1M token context and configurable reasoning
NVIDIA released Nemotron-3-Ultra-550B, a frontier-scale model with 550B total parameters (55B active) and up to 1M token context window. The model uses a hybrid LatentMoE architecture combining Mamba-2, MoE, and attention layers with Multi-Token Prediction, trained with NVFP4 quantization-aware methods from December 2025 to April 2026.
NVIDIA Nemotron 3 Ultra launches on AWS SageMaker with 550B parameters, 1M token context window
NVIDIA Nemotron 3 Ultra is now available on Amazon SageMaker JumpStart with 550 billion total parameters and 55 billion active parameters. The model features a hybrid Transformer-Mamba Mixture-of-Experts architecture and supports context windows up to 1 million tokens, targeting agentic AI workloads.
Comments
Loading...