DeepSeek V4 Pro launches with 1.6T parameters at $1.74/M tokens, undercutting Claude Sonnet 4.6 by 42%
DeepSeek released two preview models: V4 Pro (1.6T total parameters, 49B active) and V4 Flash (284B total, 13B active), both with 1 million token context windows. V4 Pro is priced at $1.74/M input tokens and $3.48/M output—42% cheaper than Claude Sonnet 4.6—while V4 Flash at $0.14/$0.28 per million tokens undercuts all small frontier models.
DeepSeek V4 Pro — Quick Specs
DeepSeek V4 Pro launches with 1.6T parameters at $1.74/M tokens, undercutting Claude Sonnet 4.6 by 42%
DeepSeek released two preview models in its V4 series: DeepSeek-V4-Pro and DeepSeek-V4-Flash. Both are Mixture of Experts models with 1 million token context windows, released under the MIT license.
Model specifications
DeepSeek-V4-Pro has 1.6 trillion total parameters with 49 billion active parameters. DeepSeek-V4-Flash has 284 billion total parameters with 13 billion active. According to DeepSeek, this makes V4 Pro the largest open weights model available, exceeding Kimi K2.6 (1.1T) and GLM-5.1 (754B), and more than double the size of DeepSeek V3.2 (685B).
The Pro model weighs 865GB on Hugging Face. Flash weighs 160GB.
Pricing comparison
DeepSeek's pricing significantly undercuts existing frontier models:
DeepSeek V4 Flash: $0.14/M input tokens, $0.28/M output tokens DeepSeek V4 Pro: $1.74/M input tokens, $3.48/M output tokens
For comparison:
- GPT-5.4: $2.50/$15 per million tokens
- Claude Sonnet 4.6: $3/$15 per million tokens
- Gemini 3.1 Pro: $2/$12 per million tokens
- Claude Haiku 4.5: $1/$5 per million tokens
- GPT-5.4 Nano: $0.20/$1.25 per million tokens
V4 Flash is the cheapest small model available. V4 Pro costs 42% less than Claude Sonnet 4.6 for input tokens and 77% less for output tokens.
Efficiency improvements
DeepSeek attributes the low pricing to substantial efficiency gains. According to their technical paper, in 1M-token context scenarios, V4 Pro achieves only 27% of the single-token FLOPs and 10% of the KV cache size compared to DeepSeek V3.2. V4 Flash achieves 10% of the FLOPs and 7% of the KV cache size.
Performance benchmarks
DeepSeek claims V4 Pro is competitive with frontier models, with one caveat. According to the company's paper: "Through the expansion of reasoning tokens, DeepSeek-V4-Pro-Max demonstrates superior performance relative to GPT-5.2 and Gemini-3.0-Pro on standard reasoning benchmarks. Nevertheless, its performance falls marginally short of GPT-5.4 and Gemini-3.1-Pro, suggesting a developmental trajectory that trails state-of-the-art frontier models by approximately 3 to 6 months."
No independent benchmarks are yet available.
Availability
Both models are available via OpenRouter and downloadable from Hugging Face. The models can be accessed through the standard llm-openrouter plugin.
What this means
DeepSeek's aggressive pricing creates immediate pressure on OpenAI, Anthropic, and Google to justify their premium pricing—or match it. At $1.74 per million input tokens, V4 Pro costs less than half of Claude Sonnet 4.6 while claiming near-frontier performance. If the performance claims hold under independent testing, this represents a fundamental shift in the economics of frontier model deployment. The 90% reduction in KV cache size at 1M tokens also suggests meaningful architectural innovations beyond simple scaling, particularly for long-context applications where memory constraints have been a limiting factor.
Related Articles
DeepSeek releases V4 model preview with agent optimization, pricing undisclosed
DeepSeek released a preview of its V4 large language model on April 24, 2026, available in 'pro' and 'flash' versions. The Hangzhou-based company claims the open-source model achieves strong performance on agent-based tasks and has been optimized for tools like Anthropic's Claude Code and OpenClaw.
DeepSeek Releases V4-Flash: 284B-Parameter MoE Model With 1M Token Context at 27% Inference Cost
DeepSeek released two Mixture-of-Experts models: V4-Flash with 284B total parameters (13B activated) and V4-Pro with 1.6T parameters (49B activated). Both models support one million token context windows and use a hybrid attention architecture that requires only 27% of the inference FLOPs compared to DeepSeek-V3.2 at 1M token context.
DeepSeek Releases V4-Flash-Base: 292B Parameter Base Model
DeepSeek has released V4-Flash-Base, a 292 billion parameter base model now available on Hugging Face. The model uses BF16, I64, F32, and F8_E4M3 tensor types and is distributed in Safetensors format.
DeepSeek Releases V4 Pro: 1.6T Parameter MoE Model with 1M Token Context at $1.74/M Input Tokens
DeepSeek has released V4 Pro, a Mixture-of-Experts model with 1.6 trillion total parameters and 49 billion activated parameters. The model supports a 1-million-token context window and costs $1.74 per million input tokens and $3.48 per million output tokens.
Comments
Loading...