DeepSeek V4 Pro

Name: DeepSeek V4 Pro
Price: 1.74 USD
Author: DeepSeek

DeepSeek🇨🇳 China

active

Compare with other models →

Context window1049K tokens

Input / 1M tokens$1.74

Output / 1M tokens$3.48

Version History

v4-promajorApril 24, 2026

DeepSeek V4 Pro is a new large-scale MoE model with 1.6T total parameters and 49B activated parameters, featuring a 1M-token context window and hybrid attention system. Built on the same architecture as V4 Flash, it adds multiple reasoning modes for complex workloads.

Benchmark Scores

Full leaderboard →

76.8%

HumanEval

93.5%

LiveCodeBench

64.5%

MATH

90.1%

MMLU

87.5%

MMLU-Pro

Coverage

model releaseDeepSeek

DeepSeek Releases V4 Pro: 1.6T Parameter MoE Model with 1M Token Context at $1.74/M Input Tokens

DeepSeek has released V4 Pro, a Mixture-of-Experts model with 1.6 trillion total parameters and 49 billion activated parameters. The model supports a 1-million-token context window and costs $1.74 per million input tokens and $3.48 per million output tokens.

April 24, 2026 · 4:21 AM2 min read

DeepSeek V4 Pro MoE

model releaseDeepSeek

DeepSeek Releases V4-Pro: 1.6T Parameter MoE Model with 1M Token Context

DeepSeek released two new Mixture-of-Experts models: DeepSeek-V4-Pro with 1.6 trillion parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated), both supporting one million token context length. The models achieve 27% of inference FLOPs and 10% of KV cache compared to DeepSeek-V3.2 at 1M context through a hybrid attention architecture combining Compressed Sparse Attention and Heavily Compressed Attention.

April 24, 2026 · 3:21 AM2 min read

deepseek moe long-context