DeepSeek V4 Flash Released: 284B Parameter MoE Model with 1M Context Window at $0.14 per Million Tokens
DeepSeek has released V4 Flash, a Mixture-of-Experts model with 284B total parameters and 13B activated parameters per request. The model supports a 1,048,576-token context window and is priced at $0.14 per million input tokens and $0.28 per million output tokens.
DeepSeek V4 Flash — Quick Specs
DeepSeek V4 Flash Released: 284B Parameter MoE Model with 1M Context Window at $0.14 per Million Tokens
DeepSeek has released V4 Flash, a Mixture-of-Experts model with 284B total parameters and 13B activated parameters per request. The model supports a 1,048,576-token context window and is priced at $0.14 per million input tokens and $0.28 per million output tokens.
Model Architecture and Capabilities
DeepSeek V4 Flash uses a sparse Mixture-of-Experts architecture that activates only 13B of its 284B total parameters for each inference request. According to DeepSeek, the model includes hybrid attention mechanisms designed for efficient long-context processing.
The model supports configurable reasoning modes, allowing it to show step-by-step thinking processes. DeepSeek claims the model maintains strong performance on reasoning and coding tasks despite its efficiency optimizations.
Pricing and Availability
The model is available through OpenRouter at:
- Input: $0.14 per million tokens
- Output: $0.28 per million tokens
These prices position V4 Flash as a cost-effective option for high-throughput workloads compared to larger models with similar context windows.
Target Use Cases
DeepSeek designed V4 Flash for applications requiring fast inference and high throughput, including:
- Coding assistants
- Chat systems
- Agent workflows
The model's sparse activation pattern (activating only 4.6% of total parameters) enables faster inference speeds while attempting to preserve model quality.
Technical Details
Release date: April 24, 2026 (as listed on OpenRouter) Context window: 1,048,576 tokens Architecture: Sparse Mixture-of-Experts Reasoning support: Configurable reasoning modes with exposed thinking processes
What This Means
DeepSeek V4 Flash continues the trend of using sparse MoE architectures to deliver capable models at lower inference costs. The 13B activated parameter count per request allows for faster processing than dense models of similar capability, while the 1M token context window matches the extended context offerings from competitors like Anthropic and Google. The $0.14/$0.28 per million token pricing undercuts many competing models with similar context lengths, potentially making it attractive for high-volume production deployments where cost per token matters more than absolute peak performance.
Related Articles
DeepSeek Releases V4 Pro: 1.6T Parameter MoE Model with 1M Token Context at $1.74/M Input Tokens
DeepSeek has released V4 Pro, a Mixture-of-Experts model with 1.6 trillion total parameters and 49 billion activated parameters. The model supports a 1-million-token context window and costs $1.74 per million input tokens and $3.48 per million output tokens.
DeepSeek Releases V4-Flash: 284B-Parameter MoE Model With 1M Token Context at 27% Inference Cost
DeepSeek released two Mixture-of-Experts models: V4-Flash with 284B total parameters (13B activated) and V4-Pro with 1.6T parameters (49B activated). Both models support one million token context windows and use a hybrid attention architecture that requires only 27% of the inference FLOPs compared to DeepSeek-V3.2 at 1M token context.
DeepSeek Releases V4-Pro: 1.6T Parameter MoE Model with 1M Token Context
DeepSeek released two new Mixture-of-Experts models: DeepSeek-V4-Pro with 1.6 trillion parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated), both supporting one million token context length. The models achieve 27% of inference FLOPs and 10% of KV cache compared to DeepSeek-V3.2 at 1M context through a hybrid attention architecture combining Compressed Sparse Attention and Heavily Compressed Attention.
Tencent Releases Hy3-Preview: 295B-Parameter MoE Model with 21B Active Parameters
Tencent has released Hy3-preview, a 295-billion-parameter Mixture-of-Experts model with 21 billion active parameters and a 256K context window. The model scores 76.28% on MATH and 34.86% on LiveCodeBench-v6, with particularly strong performance on coding agent tasks.
Comments
Loading...