model releaseDeepSeek

DeepSeek V4 Pro launches with 1.6 trillion parameters, 1M token context at $0.145 per million input tokens

TL;DR

Chinese AI lab DeepSeek has released preview versions of DeepSeek V4 Flash and V4 Pro, mixture-of-experts models with 1 million token context windows. The V4 Pro has 1.6 trillion total parameters (49 billion active), making it the largest open-weight model available, while both models significantly undercut frontier model pricing.

2 min read
0

DeepSeek V4 Pro — Quick Specs

Context window1000K tokens
Input$0.0036/1M tokens
Output$0.87/1M tokens

DeepSeek V4 Pro launches with 1.6 trillion parameters, 1M token context at $0.145 per million input tokens

Chinese AI lab DeepSeek has released preview versions of DeepSeek V4 Flash and V4 Pro, both featuring 1 million token context windows and mixture-of-experts architectures that activate only a subset of parameters per task to reduce inference costs.

Model specifications

DeepSeek V4 Pro contains 1.6 trillion total parameters with 49 billion active parameters, making it the largest open-weight model available. This exceeds Moonshot AI's Kimi K 2.6 (1.1 trillion parameters), MiniMax's M1 (456 billion), and more than doubles DeepSeek's previous V3.2 model (671 billion parameters).

The smaller V4 Flash has 284 billion total parameters with 13 billion active parameters.

Both models support text only, lacking the multimodal capabilities (audio, video, image) found in many closed-source frontier models.

Performance claims

According to DeepSeek, the V4-Pro-Max model outperforms open-source peers across reasoning benchmarks and surpasses OpenAI's GPT-5.2 and Gemini 3.0 Pro on some tasks. The company claims both V4 models perform comparably to GPT-5.4 on coding competition benchmarks.

However, DeepSeek acknowledges the models lag behind frontier models in knowledge tests, specifically OpenAI's GPT-5.4 and Google's Gemini 3.1 Pro. The lab states this represents "a developmental trajectory that trails state-of-the-art frontier models by approximately 3 to 6 months."

Pricing

DeepSeek V4 Flash: $0.14 per million input tokens, $0.28 per million output tokens — undercutting GPT-5.4 Nano, Gemini 3.1 Flash, GPT-5.4 Mini, and Claude Haiku 4.5.

DeepSeek V4 Pro: $0.145 per million input tokens, $3.48 per million output tokens — undercutting Gemini 3.1 Pro, GPT-5.5, Claude Opus 4.7, and GPT-5.4.

Timing and controversy

The launch follows U.S. accusations that China has stolen American AI labs' intellectual property using thousands of proxy accounts. DeepSeek has been specifically accused by Anthropic and OpenAI of "distilling" (copying) their AI models.

What this means

DeepSeek V4 Pro represents the largest open-weight model by parameter count, though its mixture-of-experts architecture means only 3% of parameters are active during inference. The 1 million token context window and aggressive pricing position these models as cost-effective alternatives for developers working with large codebases or documents, despite trailing frontier models by several months in knowledge capabilities and lacking multimodal support. The text-only limitation and acknowledged performance gap suggest DeepSeek is prioritizing efficiency and cost over feature parity with leading closed-source models.

Related Articles

Comments

Loading...