Mixture-of-Experts

7 articles tagged with Mixture-of-Experts

June 12, 2026

Moonshot AI releases Kimi K2.7 Code with 1T parameters, 256K context window, 30% lower thinking token usage

Moonshot AI has released Kimi K2.7 Code, a 1 trillion parameter Mixture-of-Experts model designed for long-horizon coding tasks. The model features a 256K context window and reduces thinking token usage by approximately 30% compared to its predecessor K2.6.

June 12, 2026 · 12:05 PM

June 1, 2026

model releaseJetBrains

JetBrains Releases Mellum2: 12B MoE Model With 2.5B Active Parameters for Code and Text

JetBrains has released Mellum2, a 12-billion parameter Mixture-of-Experts model that activates only 2.5 billion parameters per token. The open-source model is designed for code generation, RAG pipelines, and agent workflows with 2x faster inference than similar-sized models.

June 1, 2026 · 4:05 PM

May 29, 2026

model releaseStepFun

StepFun launches Step 3.7 Flash: 196B MoE model with 256K context and adjustable reasoning levels at $0.20/$1.15 per 1M

StepFun has released Step 3.7 Flash, a 196B-parameter Mixture-of-Experts model that activates approximately 11B parameters per token. The multimodal model supports a 256K context window and introduces selectable reasoning levels (high/medium/low), priced at $0.20 per 1M input tokens and $1.15 per 1M output tokens.

May 29, 2026 · 12:20 AM

May 13, 2026

model releaseDeepSeek

DeepSeek Releases V4 Flash: 284B-Parameter MoE Model with 1M Context Window, Free via OpenRouter

DeepSeek has released V4 Flash, a Mixture-of-Experts model with 284B total parameters and 13B activated parameters per forward pass. The model supports a 1M-token context window and is available free through OpenRouter, targeting high-throughput coding and chat applications.

May 13, 2026 · 11:50 PM

April 24, 2026

model releaseDeepSeek

DeepSeek Releases V4 Pro: 1.6T Parameter MoE Model with 1M Token Context at $1.74/M Input Tokens

DeepSeek has released V4 Pro, a Mixture-of-Experts model with 1.6 trillion total parameters and 49 billion activated parameters. The model supports a 1-million-token context window and costs $1.74 per million input tokens and $3.48 per million output tokens.

April 24, 2026 · 4:21 AM

model releaseDeepSeek

DeepSeek V4 Flash Released: 284B Parameter MoE Model with 1M Context Window at $0.14 per Million Tokens

DeepSeek has released V4 Flash, a Mixture-of-Experts model with 284B total parameters and 13B activated parameters per request. The model supports a 1,048,576-token context window and is priced at $0.14 per million input tokens and $0.28 per million output tokens.

April 24, 2026 · 4:21 AM

April 23, 2026

model releaseTencent

Tencent Releases Hy3 Preview MoE Model with 262K Context and Three Reasoning Modes

Tencent has released Hy3 Preview, a Mixture-of-Experts model offering 262,144 token context window and three configurable reasoning modes (disabled, low, high) for production agentic workflows. The model is available for free through OpenRouter.

April 23, 2026 · 5:20 AM

← Back to all news