model releaseDeepSeek

DeepSeek V4 Pro launches with 1.6T parameters at $1.74/M tokens, undercutting Claude Sonnet 4.6 by 42%

TL;DR

DeepSeek released two preview models: V4 Pro (1.6T total parameters, 49B active) and V4 Flash (284B total, 13B active), both with 1 million token context windows. V4 Pro is priced at $1.74/M input tokens and $3.48/M output—42% cheaper than Claude Sonnet 4.6—while V4 Flash at $0.14/$0.28 per million tokens undercuts all small frontier models.

2 min read
0

DeepSeek V4 Pro — Quick Specs

Context window1000K tokens
Input$1.74/1M tokens
Output$3.48/1M tokens

DeepSeek V4 Pro launches with 1.6T parameters at $1.74/M tokens, undercutting Claude Sonnet 4.6 by 42%

DeepSeek released two preview models in its V4 series: DeepSeek-V4-Pro and DeepSeek-V4-Flash. Both are Mixture of Experts models with 1 million token context windows, released under the MIT license.

Model specifications

DeepSeek-V4-Pro has 1.6 trillion total parameters with 49 billion active parameters. DeepSeek-V4-Flash has 284 billion total parameters with 13 billion active. According to DeepSeek, this makes V4 Pro the largest open weights model available, exceeding Kimi K2.6 (1.1T) and GLM-5.1 (754B), and more than double the size of DeepSeek V3.2 (685B).

The Pro model weighs 865GB on Hugging Face. Flash weighs 160GB.

Pricing comparison

DeepSeek's pricing significantly undercuts existing frontier models:

DeepSeek V4 Flash: $0.14/M input tokens, $0.28/M output tokens DeepSeek V4 Pro: $1.74/M input tokens, $3.48/M output tokens

For comparison:

  • GPT-5.4: $2.50/$15 per million tokens
  • Claude Sonnet 4.6: $3/$15 per million tokens
  • Gemini 3.1 Pro: $2/$12 per million tokens
  • Claude Haiku 4.5: $1/$5 per million tokens
  • GPT-5.4 Nano: $0.20/$1.25 per million tokens

V4 Flash is the cheapest small model available. V4 Pro costs 42% less than Claude Sonnet 4.6 for input tokens and 77% less for output tokens.

Efficiency improvements

DeepSeek attributes the low pricing to substantial efficiency gains. According to their technical paper, in 1M-token context scenarios, V4 Pro achieves only 27% of the single-token FLOPs and 10% of the KV cache size compared to DeepSeek V3.2. V4 Flash achieves 10% of the FLOPs and 7% of the KV cache size.

Performance benchmarks

DeepSeek claims V4 Pro is competitive with frontier models, with one caveat. According to the company's paper: "Through the expansion of reasoning tokens, DeepSeek-V4-Pro-Max demonstrates superior performance relative to GPT-5.2 and Gemini-3.0-Pro on standard reasoning benchmarks. Nevertheless, its performance falls marginally short of GPT-5.4 and Gemini-3.1-Pro, suggesting a developmental trajectory that trails state-of-the-art frontier models by approximately 3 to 6 months."

No independent benchmarks are yet available.

Availability

Both models are available via OpenRouter and downloadable from Hugging Face. The models can be accessed through the standard llm-openrouter plugin.

What this means

DeepSeek's aggressive pricing creates immediate pressure on OpenAI, Anthropic, and Google to justify their premium pricing—or match it. At $1.74 per million input tokens, V4 Pro costs less than half of Claude Sonnet 4.6 while claiming near-frontier performance. If the performance claims hold under independent testing, this represents a fundamental shift in the economics of frontier model deployment. The 90% reduction in KV cache size at 1M tokens also suggests meaningful architectural innovations beyond simple scaling, particularly for long-context applications where memory constraints have been a limiting factor.

Related Articles

model release

DeepSeek releases V4 model preview with agent optimization, pricing undisclosed

DeepSeek released a preview of its V4 large language model on April 24, 2026, available in 'pro' and 'flash' versions. The Hangzhou-based company claims the open-source model achieves strong performance on agent-based tasks and has been optimized for tools like Anthropic's Claude Code and OpenClaw.

model release

DeepSeek Releases V4-Flash: 284B-Parameter MoE Model With 1M Token Context at 27% Inference Cost

DeepSeek released two Mixture-of-Experts models: V4-Flash with 284B total parameters (13B activated) and V4-Pro with 1.6T parameters (49B activated). Both models support one million token context windows and use a hybrid attention architecture that requires only 27% of the inference FLOPs compared to DeepSeek-V3.2 at 1M token context.

model release

DeepSeek Releases V4-Flash-Base: 292B Parameter Base Model

DeepSeek has released V4-Flash-Base, a 292 billion parameter base model now available on Hugging Face. The model uses BF16, I64, F32, and F8_E4M3 tensor types and is distributed in Safetensors format.

model release

DeepSeek Releases V4 Pro: 1.6T Parameter MoE Model with 1M Token Context at $1.74/M Input Tokens

DeepSeek has released V4 Pro, a Mixture-of-Experts model with 1.6 trillion total parameters and 49 billion activated parameters. The model supports a 1-million-token context window and costs $1.74 per million input tokens and $3.48 per million output tokens.

Comments

Loading...