model releaseDeepSeek

DeepSeek Releases V4 Flash: 284B-Parameter MoE Model with 1M Context Window, Free via OpenRouter

TL;DR

DeepSeek has released V4 Flash, a Mixture-of-Experts model with 284B total parameters and 13B activated parameters per forward pass. The model supports a 1M-token context window and is available free through OpenRouter, targeting high-throughput coding and chat applications.

May 13, 2026 · 11:50 PM2 min read

DeepSeek V4 Flash — Quick Specs

Context window1000K tokens

Compare DeepSeek V4 Flash with other models →

DeepSeek Releases V4 Flash: 284B-Parameter MoE Model with 1M Context Window, Free via OpenRouter

DeepSeek has released V4 Flash, a Mixture-of-Experts model featuring 284B total parameters with 13B activated per inference pass. The model supports a 1M-token context window and is available at no cost through OpenRouter's API platform.

Technical Specifications

DeepSeek V4 Flash employs a sparse MoE architecture that activates only 13B of its 284B total parameters during each forward pass, designed to reduce inference costs while maintaining performance. According to DeepSeek, the model uses hybrid attention mechanisms for efficient long-context processing.

The model supports reasoning modes with "high" and "xhigh" effort levels, where xhigh maps to maximum reasoning capability. OpenRouter's implementation allows access to the model's step-by-step reasoning process through a reasoning_details array in API responses.

Context and Availability

The 1M-token context window positions V4 Flash among large-context models from competitors like Anthropic's Claude 3.5 Sonnet (200K tokens) and Google's Gemini 1.5 Pro (2M tokens). DeepSeek lists the model's release date as April 24, 2026 on OpenRouter's platform—likely an error in documentation.

OpenRouter reports serving 1.27 trillion tokens weekly for the model across its provider network. The free tier has a 256K-token context limit, reduced from the full 1M capacity.

Target Applications

DeepSeek positions V4 Flash for:

Coding assistants requiring fast response times
High-throughput chat systems
Agent workflows with multiple API calls
Applications where cost efficiency outweighs maximum capability

The MoE architecture aims to deliver faster inference than dense models of similar capability by activating fewer parameters per request.

What This Means

DeepSeek V4 Flash represents China-based DeepSeek's continued push into efficiency-optimized large language models, following their earlier V2 and V3 releases. The free availability through OpenRouter lowers barriers for developers testing long-context applications, though the reduced 256K free tier context limit may push production workloads to paid alternatives. The 284B-13B MoE configuration suggests DeepSeek is prioritizing inference cost over raw capability, betting that most applications don't require full dense model computation for acceptable performance.

Source: openrouter.ai ↗

DeepSeek V4 Flash MoE Mixture-of-Experts long-context 1M context free API OpenRouter

model releaseMay 8, 2026

Allen Institute releases EMO, 14B parameter MoE model with selective 12.5% expert use

Allen Institute for AI released EMO, a 1B-active, 14B-total-parameter mixture-of-experts model trained on 1 trillion tokens. The model uses 8 active experts per token from a pool of 128 total experts, and can maintain near full-model performance while using just 12.5% of its experts for specific tasks.

model releaseMay 8, 2026

InclusionAI Releases Ring-2.6-1T: 1 Trillion Parameter Thinking Model with 63B Active Parameters

InclusionAI has released Ring-2.6-1T, a 1 trillion parameter-scale model with 63 billion active parameters and a 262,144-token context window. The model features adaptive reasoning modes and is designed for coding agents, tool use, and long-horizon task execution.

model releaseMay 7, 2026

Zyphra Releases ZAYA1-8B: 8.4B Parameter MoE Model with 760M Active Parameters Matches 80B+ Models on Math Benchmarks

Zyphra has released ZAYA1-8B, a mixture-of-experts language model with 760M active parameters and 8.4B total parameters. The model scores 89.1% on AIME 2026, competitive with models exceeding 100B parameters, while maintaining efficiency for on-device deployment.

model releaseMay 6, 2026

Google DeepMind releases Gemma 4 with 31B dense model, 256K context window, and speculative decoding drafters

Google DeepMind has released Gemma 4, a family of open-weight multimodal models including a 31B dense model with 256K context window and four size variants ranging from 2.3B to 30.7B effective parameters. The release includes Multi-Token Prediction (MTP) draft models that achieve up to 2x decoding speedup through speculative decoding while maintaining identical output quality.

DeepSeek Releases V4 Flash: 284B-Parameter MoE Model with 1M Context Window, Free via OpenRouter

DeepSeek V4 Flash — Quick Specs

DeepSeek Releases V4 Flash: 284B-Parameter MoE Model with 1M Context Window, Free via OpenRouter

Technical Specifications

Context and Availability

Target Applications

What This Means

Related Articles

Allen Institute releases EMO, 14B parameter MoE model with selective 12.5% expert use

InclusionAI Releases Ring-2.6-1T: 1 Trillion Parameter Thinking Model with 63B Active Parameters

Zyphra Releases ZAYA1-8B: 8.4B Parameter MoE Model with 760M Active Parameters Matches 80B+ Models on Math Benchmarks

Google DeepMind releases Gemma 4 with 31B dense model, 256K context window, and speculative decoding drafters

Comments