model releaseDeepSeek

DeepSeek V4 Flash Released: 284B Parameter MoE Model with 1M Context Window at $0.14 per Million Tokens

TL;DR

DeepSeek has released V4 Flash, a Mixture-of-Experts model with 284B total parameters and 13B activated parameters per request. The model supports a 1,048,576-token context window and is priced at $0.14 per million input tokens and $0.28 per million output tokens.

April 24, 2026 · 4:21 AM2 min read

DeepSeek V4 Flash — Quick Specs

Context window1049K tokens

Input$0.14/1M tokens

Output$0.28/1M tokens

Compare DeepSeek V4 Flash with other models →

DeepSeek V4 Flash Released: 284B Parameter MoE Model with 1M Context Window at $0.14 per Million Tokens

Model Architecture and Capabilities

DeepSeek V4 Flash uses a sparse Mixture-of-Experts architecture that activates only 13B of its 284B total parameters for each inference request. According to DeepSeek, the model includes hybrid attention mechanisms designed for efficient long-context processing.

The model supports configurable reasoning modes, allowing it to show step-by-step thinking processes. DeepSeek claims the model maintains strong performance on reasoning and coding tasks despite its efficiency optimizations.

Pricing and Availability

The model is available through OpenRouter at:

Input: $0.14 per million tokens
Output: $0.28 per million tokens

These prices position V4 Flash as a cost-effective option for high-throughput workloads compared to larger models with similar context windows.

Target Use Cases

DeepSeek designed V4 Flash for applications requiring fast inference and high throughput, including:

Coding assistants
Chat systems
Agent workflows

The model's sparse activation pattern (activating only 4.6% of total parameters) enables faster inference speeds while attempting to preserve model quality.

Technical Details

Release date: April 24, 2026 (as listed on OpenRouter) Context window: 1,048,576 tokens Architecture: Sparse Mixture-of-Experts Reasoning support: Configurable reasoning modes with exposed thinking processes

What This Means

DeepSeek V4 Flash continues the trend of using sparse MoE architectures to deliver capable models at lower inference costs. The 13B activated parameter count per request allows for faster processing than dense models of similar capability, while the 1M token context window matches the extended context offerings from competitors like Anthropic and Google. The $0.14/$0.28 per million token pricing undercuts many competing models with similar context lengths, potentially making it attractive for high-volume production deployments where cost per token matters more than absolute peak performance.

Source: openrouter.ai ↗

DeepSeek V4 Flash Mixture-of-Experts MoE long context reasoning model release coding

model releaseApril 24, 2026

DeepSeek Releases V4 Pro: 1.6T Parameter MoE Model with 1M Token Context at $1.74/M Input Tokens

DeepSeek has released V4 Pro, a Mixture-of-Experts model with 1.6 trillion total parameters and 49 billion activated parameters. The model supports a 1-million-token context window and costs $1.74 per million input tokens and $3.48 per million output tokens.

model releaseApril 24, 2026

DeepSeek Releases V4-Flash: 284B-Parameter MoE Model With 1M Token Context at 27% Inference Cost

DeepSeek released two Mixture-of-Experts models: V4-Flash with 284B total parameters (13B activated) and V4-Pro with 1.6T parameters (49B activated). Both models support one million token context windows and use a hybrid attention architecture that requires only 27% of the inference FLOPs compared to DeepSeek-V3.2 at 1M token context.

model releaseApril 24, 2026

DeepSeek Releases V4-Pro: 1.6T Parameter MoE Model with 1M Token Context

DeepSeek released two new Mixture-of-Experts models: DeepSeek-V4-Pro with 1.6 trillion parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated), both supporting one million token context length. The models achieve 27% of inference FLOPs and 10% of KV cache compared to DeepSeek-V3.2 at 1M context through a hybrid attention architecture combining Compressed Sparse Attention and Heavily Compressed Attention.

model releaseApril 23, 2026

Tencent Releases Hy3-Preview: 295B-Parameter MoE Model with 21B Active Parameters

Tencent has released Hy3-preview, a 295-billion-parameter Mixture-of-Experts model with 21 billion active parameters and a 256K context window. The model scores 76.28% on MATH and 34.86% on LiveCodeBench-v6, with particularly strong performance on coding agent tasks.

DeepSeek V4 Flash Released: 284B Parameter MoE Model with 1M Context Window at $0.14 per Million Tokens

DeepSeek V4 Flash — Quick Specs

DeepSeek V4 Flash Released: 284B Parameter MoE Model with 1M Context Window at $0.14 per Million Tokens

Model Architecture and Capabilities

Pricing and Availability

Target Use Cases

Technical Details

What This Means

Related Articles

DeepSeek Releases V4 Pro: 1.6T Parameter MoE Model with 1M Token Context at $1.74/M Input Tokens

DeepSeek Releases V4-Flash: 284B-Parameter MoE Model With 1M Token Context at 27% Inference Cost

DeepSeek Releases V4-Pro: 1.6T Parameter MoE Model with 1M Token Context

Tencent Releases Hy3-Preview: 295B-Parameter MoE Model with 21B Active Parameters

Comments