model releaseDeepSeek

DeepSeek Releases V4-Pro with 1.6T Parameters, 1M Token Context at 27% Inference Cost of V3

TL;DR

DeepSeek has released two Mixture-of-Experts models: V4-Pro with 1.6 trillion parameters (49B activated) and V4-Flash with 284B parameters (13B activated), both supporting 1 million token context windows. V4-Pro requires only 27% of inference FLOPs and 10% of KV cache compared to V3.2 at 1M token context, trained on over 32 trillion tokens.

2 min read
0

DeepSeek Releases V4-Pro with 1.6T Parameters, 1M Token Context at 27% Inference Cost of V3

DeepSeek has released two new Mixture-of-Experts language models: DeepSeek-V4-Pro with 1.6 trillion parameters (49 billion activated) and DeepSeek-V4-Flash with 284 billion parameters (13 billion activated). Both models support a context length of 1 million tokens.

Efficiency Gains Through Hybrid Attention

The V4 series introduces a hybrid attention architecture combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA). In 1M-token context settings, V4-Pro requires only 27% of single-token inference FLOPs and 10% of KV cache compared to DeepSeek-V3.2.

Both models were pre-trained on more than 32 trillion diverse tokens, followed by a two-stage post-training pipeline: independent domain-specific expert cultivation through supervised fine-tuning and reinforcement learning with GRPO, then unified model consolidation via on-policy distillation.

Benchmark Performance

DeepSeek-V4-Pro-Base achieves 90.1% on MMLU (5-shot), 73.5% on MMLU-Pro, 76.8% on HumanEval (0-shot), and 51.5% on LongBench-V2. The model scores 55.2% on Simple-QA verified and 62.6% on FACTS Parametric, indicating significant knowledge capability improvements.

DeepSeek-V4-Pro-Max, the maximum reasoning effort mode, achieves a 3206 Codeforces rating, 93.5% on LiveCodeBench, 89.8% on IMOAnswerBench, and 90.2% on Apex Shortlist. On agentic tasks, it scores 80.6% on SWE Verified, 67.9% on Terminal Bench 2.0, and 83.4% on BrowseComp.

Three Reasoning Modes

Both models support three reasoning effort modes:

  • Non-think: Fast, intuitive responses for routine tasks
  • Think: Conscious logical analysis with visible reasoning chains
  • Think Max: Maximum reasoning capability with extended thinking budget

The Flash-Max variant achieves comparable reasoning performance to Pro when given larger thinking budgets, though it trails on pure knowledge tasks and complex agentic workflows due to its smaller parameter scale.

Technical Architecture

V4 series incorporates Manifold-Constrained Hyper-Connections (mHC) to strengthen residual connections, enhancing signal propagation stability across layers. The models use the Muon optimizer for faster convergence and greater training stability.

Models are available in FP8 mixed precision (base versions) and FP4 + FP8 mixed precision (post-trained versions), where MoE expert parameters use FP4 and other parameters use FP8.

Availability

All four model variants (V4-Pro-Base, V4-Pro, V4-Flash-Base, V4-Flash) are available on HuggingFace and ModelScope. Pricing has not been disclosed.

Note: The DeepSeek-V4-Pro-DSpark checkpoint is not a new model but the same V4-Pro checkpoint with an additional speculative decoding module for inference optimization.

What This Means

DeepSeek's 73% reduction in inference costs at 1M context length addresses a critical bottleneck in long-context applications. The V4-Pro-Max performance on coding benchmarks (3206 Codeforces rating) and math reasoning (89.8% IMOAnswerBench) positions it competitively with frontier closed-source models like Claude Opus 4.6 and GPT-5.4. The three-tier reasoning mode system provides practical flexibility for balancing speed and accuracy based on task complexity. The open-source release of models this large (1.6T parameters) with competitive performance represents a significant shift in accessibility to frontier-level capabilities.

Related Articles

model release

China's Z.ai releases GLM-5.2, open-source model matching Claude and GPT-5.5 in cybersecurity tasks

Z.ai's GLM-5.2 performs on par with Claude Opus 4.8 and OpenAI's GPT-5.5 in cybersecurity benchmarks while costing roughly half as much to run. Security evaluations from Graphistry and Semgrep confirm the open-weight model's capabilities in vulnerability discovery and cyber investigation, raising concerns about accessibility of advanced hacking tools.

model release

DeepSeek-V4-Fable: Offensive Security Model Trained on 80,000 CTF Trajectories Achieves 58.7% Solve Rate

Chunjiang Intelligence has released DeepSeek-V4-Fable, an autonomous agent model designed for offensive security research and CTF challenges. The model, distilled from Claude-5-Fable and built on DeepSeek-V4-Flash, was trained on 80,000 verified CTF trajectories and achieves a 58.7% solve rate across held-out security challenges.

model release

Anthropic's Fable 5 model expected to return next week after 15-day government shutdown

The Trump administration is close to allowing Anthropic to restore access to its Fable 5 model, which has been offline for 15 days due to national security concerns. Insiders expect restrictions could be lifted as soon as next week, though Pentagon and NSA approval is still required.

model release

OpenAI previews GPT-5.6 to select partners with three variants priced from $1 to $30 per million tokens

OpenAI has begun previewing its GPT-5.6 series to a limited group of trusted partners after government review. The release includes three variants: Sol at $5 input/$30 output per million tokens, Terra at $2.50/$15, and Luna at $1/$6.

Comments

Loading...