DeepSeek V4 Pro

DeepSeek🇨🇳 China
active
Context window1049K tokens
Input / 1M tokens$1.74
Output / 1M tokens$3.48

Version History

v4-promajor

DeepSeek V4 Pro is a new large-scale MoE model with 1.6T total parameters and 49B activated parameters, featuring a 1M-token context window and hybrid attention system. Built on the same architecture as V4 Flash, it adds multiple reasoning modes for complex workloads.

Benchmark Scores

Full leaderboard →
76.8%
HumanEval
93.5%
LiveCodeBench
64.5%
MATH
90.1%
MMLU
87.5%
MMLU-Pro

Coverage

model releaseDeepSeek

DeepSeek Releases V4-Pro: 1.6T Parameter MoE Model with 1M Token Context

DeepSeek released two new Mixture-of-Experts models: DeepSeek-V4-Pro with 1.6 trillion parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated), both supporting one million token context length. The models achieve 27% of inference FLOPs and 10% of KV cache compared to DeepSeek-V3.2 at 1M context through a hybrid attention architecture combining Compressed Sparse Attention and Heavily Compressed Attention.

2 min read