JetBrains Releases Mellum2-12B Reasoning Model with 131K Context and Mixture-of-Experts Architecture

TL;DR

JetBrains has released Mellum2-12B-A2.5B-Thinking, a reasoning-augmented assistant model with 131,072-token context window and 64 Mixture-of-Experts architecture that activates 8 experts per token. The model emits explicit chain-of-thought reasoning inside <think> blocks before providing final answers.

June 2, 2026 · 9:06 AM2 min read

Mellum2-12B-A2.5B-Thinking — Quick Specs

Context window131K tokens

Compare Mellum2-12B-A2.5B-Thinking with other models →

JetBrains Releases Mellum2-12B Reasoning Model with 131K Context and Mixture-of-Experts Architecture

JetBrains has released Mellum2-12B-A2.5B-Thinking, a reasoning-augmented assistant model with a 131,072-token context window that emits explicit chain-of-thought reasoning inside <think>...</think> blocks before providing final answers.

Architecture and Training

The model uses a Mixture-of-Experts (MoE) architecture with 64 experts, activating 8 experts per token. It features 28 layers with a hidden size of 2,304 and uses grouped-query attention with 32 query heads and 4 key-value heads. The architecture combines sliding-window attention (1,024 tokens) with full attention layers.

According to JetBrains, the model was produced from Mellum2-12B-A2.5B-Base through supervised fine-tuning (computing loss only on the final assistant turn), followed by reinforcement learning with verifiable rewards (RLVR) on a harder data mix that includes long-form math problems.

Benchmark Performance

On self-reported benchmarks, the Thinking variant scores 69.9% on LiveCodeBench v6, 58.4% on AIME (mean of 2025 and 2026, 30 questions each), and 87.0% on GSM-Plus. On MMLU-Redux, it achieves 86.2% accuracy.

The model scores 45.6% on Berkeley Function Calling Leaderboard (BFCL) v4, which measures tool-calling capability across five subtasks. On conversational tasks, it achieves 76.5% on IFEval and 66.9% on MixEval.

For comparison, JetBrains reports that Qwen3.5-9B scores 73.4% on AIME and 90.7% on GSM-Plus, while Ministral 3 (14B) scores 38.3% on AIME and 86.5% on GSM-Plus.

Technical Details

The model has a vocabulary size of 98,304 tokens and uses bfloat16 precision. It can be served with vLLM using the Qwen3 reasoning parser and supports tool calling with the Hermes parser.

JetBrains has released the model under the Apache 2.0 license. The company also offers a standard "Instruct" variant for direct, low-latency answers without reasoning traces, though pricing has not been disclosed for either version.

Model Family

Mellum2 includes six checkpoints: Base Pretrain, Base (final base model), Instruct SFT, Thinking SFT, Instruct (RL-tuned), and Thinking (RL-tuned). The architecture uses an MoE intermediate size of 896 compared to a standard intermediate size of 7,168 for dense layers.

What This Means

JetBrains' entry into reasoning models puts a developer-tools company directly into competition with Anthropic, OpenAI, and DeepSeek in the chain-of-thought reasoning space. The 131K context window and Apache 2.0 license make it particularly attractive for developers working with large codebases who want self-hosted reasoning capabilities. However, the benchmark scores trail leading models like Qwen3.5-9B on math tasks, suggesting it may be better suited for coding and debugging workflows than pure reasoning tasks.

Source: huggingface.co ↗

JetBrains Mellum2 reasoning-models mixture-of-experts open-source chain-of-thought long-context

model releaseJuly 17, 2026

Moonshot AI's Kimi k3 claims top performance among Chinese models with 1M token context

Moonshot AI has released Kimi k3, positioning it as China's leading AI model. The company claims the model features a 1 million token context window and improved reasoning capabilities, though independent benchmarks are not yet available.

model releaseJuly 16, 2026

Moonshot AI releases 2.8T parameter Kimi K3, pricing at $3/$15 per million tokens

Chinese AI lab Moonshot AI released Kimi K3, a 2.8 trillion parameter model priced at $3 per million input tokens and $15 per million output tokens. The model is currently available via API, with open weights promised by July 27, 2026. This represents the most expensive pricing from a Chinese AI lab to date, matching Anthropic's Claude Sonnet series.

model releaseJuly 16, 2026

Thinking Machines Lab releases Inkling: 975B-parameter open-weights multimodal model under Apache-2.0

Thinking Machines Lab released Inkling, a Mixture-of-Experts transformer with 975B total parameters and 41B active parameters, trained on 45 trillion tokens of text, images, audio and video. The Apache-2.0 licensed model is designed as a base for fine-tuning rather than a frontier model.

model releaseJuly 15, 2026

Mira Murati's Thinking Machines releases Inkling, 975B-parameter open-weight model trained on 45T tokens

Thinking Machines Lab released Inkling, a 975-billion-parameter mixture-of-experts model that uses 41 billion active parameters per task. The open-weight model was trained on 45 trillion tokens across text, image, audio, and video, marking the first public release from Mira Murati's AI startup.

JetBrains Releases Mellum2-12B Reasoning Model with 131K Context and Mixture-of-Experts Architecture

Mellum2-12B-A2.5B-Thinking — Quick Specs

JetBrains Releases Mellum2-12B Reasoning Model with 131K Context and Mixture-of-Experts Architecture

Architecture and Training

Benchmark Performance

Technical Details

Model Family

What This Means

Related Articles

Moonshot AI's Kimi k3 claims top performance among Chinese models with 1M token context

Moonshot AI releases 2.8T parameter Kimi K3, pricing at $3/$15 per million tokens

Thinking Machines Lab releases Inkling: 975B-parameter open-weights multimodal model under Apache-2.0

Mira Murati's Thinking Machines releases Inkling, 975B-parameter open-weight model trained on 45T tokens

Comments