DeepSeek Releases V4 Flash: 284B-Parameter MoE Model with 1M Context Window, Free via OpenRouter
DeepSeek has released V4 Flash, a Mixture-of-Experts model with 284B total parameters and 13B activated parameters per forward pass. The model supports a 1M-token context window and is available free through OpenRouter, targeting high-throughput coding and chat applications.
DeepSeek V4 Flash — Quick Specs
DeepSeek Releases V4 Flash: 284B-Parameter MoE Model with 1M Context Window, Free via OpenRouter
DeepSeek has released V4 Flash, a Mixture-of-Experts model featuring 284B total parameters with 13B activated per inference pass. The model supports a 1M-token context window and is available at no cost through OpenRouter's API platform.
Technical Specifications
DeepSeek V4 Flash employs a sparse MoE architecture that activates only 13B of its 284B total parameters during each forward pass, designed to reduce inference costs while maintaining performance. According to DeepSeek, the model uses hybrid attention mechanisms for efficient long-context processing.
The model supports reasoning modes with "high" and "xhigh" effort levels, where xhigh maps to maximum reasoning capability. OpenRouter's implementation allows access to the model's step-by-step reasoning process through a reasoning_details array in API responses.
Context and Availability
The 1M-token context window positions V4 Flash among large-context models from competitors like Anthropic's Claude 3.5 Sonnet (200K tokens) and Google's Gemini 1.5 Pro (2M tokens). DeepSeek lists the model's release date as April 24, 2026 on OpenRouter's platform—likely an error in documentation.
OpenRouter reports serving 1.27 trillion tokens weekly for the model across its provider network. The free tier has a 256K-token context limit, reduced from the full 1M capacity.
Target Applications
DeepSeek positions V4 Flash for:
- Coding assistants requiring fast response times
- High-throughput chat systems
- Agent workflows with multiple API calls
- Applications where cost efficiency outweighs maximum capability
The MoE architecture aims to deliver faster inference than dense models of similar capability by activating fewer parameters per request.
What This Means
DeepSeek V4 Flash represents China-based DeepSeek's continued push into efficiency-optimized large language models, following their earlier V2 and V3 releases. The free availability through OpenRouter lowers barriers for developers testing long-context applications, though the reduced 256K free tier context limit may push production workloads to paid alternatives. The 284B-13B MoE configuration suggests DeepSeek is prioritizing inference cost over raw capability, betting that most applications don't require full dense model computation for acceptable performance.
Related Articles
DeepSeek Releases V4-Pro with 1.6T Parameters, 1M Token Context at 27% Inference Cost of V3
DeepSeek has released two Mixture-of-Experts models: V4-Pro with 1.6 trillion parameters (49B activated) and V4-Flash with 284B parameters (13B activated), both supporting 1 million token context windows. V4-Pro requires only 27% of inference FLOPs and 10% of KV cache compared to V3.2 at 1M token context, trained on over 32 trillion tokens.
Sakana AI Releases Fugu Ultra: Multi-Agent Orchestration System with 1M Context Window at $5/$30 per Million Tokens
Sakana AI has released Fugu Ultra, a multi-agent orchestration system that routes tasks across pools of underlying models rather than operating as a single monolithic model. The system supports a 1M token context window and is priced at $5 per million input tokens and $30 per million output tokens.
DeepSeek-V4-Fable: Offensive Security Model Trained on 80,000 CTF Trajectories Achieves 58.7% Solve Rate
Chunjiang Intelligence has released DeepSeek-V4-Fable, an autonomous agent model designed for offensive security research and CTF challenges. The model, distilled from Claude-5-Fable and built on DeepSeek-V4-Flash, was trained on 80,000 verified CTF trajectories and achieves a 58.7% solve rate across held-out security challenges.
Alibaba Qwen Releases 35B Language World Model for Agent Environment Simulation Across 7 Domains
Alibaba's Qwen team released Qwen-AgentWorld-35B-A3B, a 35 billion parameter language world model designed for agentic environment simulation. The model covers seven domains—MCP tool calling, Search, Terminal, Software Engineering, Android, Web, and OS—in a single model with a 262,144 token context window.
Comments
Loading...