DeepSeek V4 Flash

DeepSeek🇨🇳 China
active
Context window1049K tokens
Input / 1M tokens$0.14
Output / 1M tokens$0.28

Version History

v4major

DeepSeek-V4-Flash is a new 284B-parameter MoE model with 13B activated parameters, featuring hybrid attention architecture that reduces inference costs by 73% at million-token context lengths. Introduces three reasoning effort modes and achieves competitive performance with frontier models on coding and mathematical reasoning.

v4-flashmajor

Initial release of DeepSeek V4 Flash, an efficiency-optimized Mixture-of-Experts model with 284B total parameters, 13B activated parameters, and 1M token context window.

Benchmark Scores

Full leaderboard →
69.5%
HumanEval
91.6%
LiveCodeBench
57.4%
MATH
88.7%
MMLU
68.3%
MMLU-Pro

Coverage

model releaseDeepSeek

DeepSeek Releases V4-Flash: 284B-Parameter MoE Model With 1M Token Context at 27% Inference Cost

DeepSeek released two Mixture-of-Experts models: V4-Flash with 284B total parameters (13B activated) and V4-Pro with 1.6T parameters (49B activated). Both models support one million token context windows and use a hybrid attention architecture that requires only 27% of the inference FLOPs compared to DeepSeek-V3.2 at 1M token context.

2 min read