JetBrains Releases Mellum2: 12B MoE Model With 2.5B Active Parameters for Code and Text

TL;DR

JetBrains has released Mellum2, a 12-billion parameter Mixture-of-Experts model that activates only 2.5 billion parameters per token. The open-source model is designed for code generation, RAG pipelines, and agent workflows with 2x faster inference than similar-sized models.

June 1, 2026 · 4:05 PM2 min read

Mellum2 — Quick Specs

Compare Mellum2 with other models →

JetBrains Releases Mellum2: 12B MoE Model With 2.5B Active Parameters for Code and Text

JetBrains has released Mellum2, a 12-billion parameter Mixture-of-Experts (MoE) model trained from scratch on natural language and code. The model activates only 2.5 billion parameters per token, delivering what JetBrains claims is more than 2x faster inference compared to similar-sized models.

Mellum2 is released under the Apache 2.0 license and is available on Hugging Face.

Model Architecture and Specifications

Mellum2 uses a Mixture-of-Experts architecture that keeps total model capacity at 12 billion parameters while activating only 2.5 billion parameters for each token. According to JetBrains, this design reduces serving costs for real-time workloads while maintaining model capability.

The model is intentionally limited to text and code — it does not handle images, audio, or video. JetBrains says this specialization keeps the model compact and efficient for software engineering tasks.

Pricing has not been disclosed. The model is designed for self-hosted deployment rather than API-based access.

Target Use Cases

JetBrains positions Mellum2 as a "focal" model for high-frequency tasks inside larger AI systems:

Routing and orchestration: Prompt classification, tool selection, and control-flow operations in multi-model systems

RAG pipelines: Context compression, summarization, and retrieval post-processing for latency-sensitive applications

Sub-agents: Planning, validation, and context preparation tasks that don't require frontier models

Private deployment: Self-hosted environments with proprietary code or internal data

The company frames Mellum2 as a complement to larger models rather than a replacement, targeting workloads where inference speed and cost matter more than raw capability.

Benchmark Performance

JetBrains claims Mellum2 is "competitive with similarly sized open models" across code generation, reasoning, science, and math benchmarks. The company published a technical report with evaluation methodology but did not disclose specific benchmark scores in the announcement.

The 2x inference speed advantage is attributed to the MoE architecture's selective parameter activation.

What This Means

Mellum2 reflects a shift toward specialized, modular AI systems rather than monolithic models. JetBrains is betting that production AI systems need fast, focused models for intermediate tasks — not just large models for final outputs.

The Apache 2.0 license and focus on self-hosting suggest JetBrains is targeting enterprises with strict data privacy requirements and developers building agent systems that require multiple model calls. The 2.5B active parameter design could make Mellum2 viable for local deployment scenarios where 7B+ dense models are too slow.

The model's competitive positioning will depend on benchmark comparisons with Qwen2.5-Coder-7B, DeepSeek-Coder-6.7B, and other code-focused models in the 6-12B parameter range. JetBrains has not yet published head-to-head comparisons with specific competitors.

Source: huggingface.co ↗

JetBrains Mellum2 Mixture-of-Experts MoE code generation open source Apache 2.0 RAG

model releaseJuly 16, 2026

Thinking Machines Lab releases Inkling: 975B-parameter open-weights multimodal model under Apache-2.0

Thinking Machines Lab released Inkling, a Mixture-of-Experts transformer with 975B total parameters and 41B active parameters, trained on 45 trillion tokens of text, images, audio and video. The Apache-2.0 licensed model is designed as a base for fine-tuning rather than a frontier model.

model releaseJuly 15, 2026

Mira Murati's Thinking Machines releases Inkling, 975B-parameter open-weight model trained on 45T tokens

Thinking Machines Lab released Inkling, a 975-billion-parameter mixture-of-experts model that uses 41 billion active parameters per task. The open-weight model was trained on 45 trillion tokens across text, image, audio, and video, marking the first public release from Mira Murati's AI startup.

model releaseJuly 16, 2026

Moonshot AI Releases Kimi K3: Open-Weight Multimodal Reasoning Model with 1M Context Window

Moonshot AI has released Kimi K3, an open-weight multimodal reasoning model with a 1-million token context window. The model is priced at $3 per 1M input tokens and $15 per 1M output tokens, available through OpenRouter.

model releaseJuly 16, 2026

Moonshot AI's Kimi K3 to launch with 2-3 trillion parameters, targets Anthropic Claude Opus 4.8 performance

Moonshot AI will release Kimi K3 in the coming days with a parameter count between 2 trillion and 3 trillion, according to Financial Times sources. The open-weight model is expected to perform at par with or surpass Anthropic's Claude Opus 4.8, making it the largest open-weight AI model from China.

JetBrains Releases Mellum2: 12B MoE Model With 2.5B Active Parameters for Code and Text

Mellum2 — Quick Specs

JetBrains Releases Mellum2: 12B MoE Model With 2.5B Active Parameters for Code and Text

Model Architecture and Specifications

Target Use Cases

Benchmark Performance

What This Means

Related Articles

Thinking Machines Lab releases Inkling: 975B-parameter open-weights multimodal model under Apache-2.0

Mira Murati's Thinking Machines releases Inkling, 975B-parameter open-weight model trained on 45T tokens

Moonshot AI Releases Kimi K3: Open-Weight Multimodal Reasoning Model with 1M Context Window

Moonshot AI's Kimi K3 to launch with 2-3 trillion parameters, targets Anthropic Claude Opus 4.8 performance

Comments