JetBrains Releases Mellum2: 12B MoE Model With 2.5B Active Parameters for Code and Text
JetBrains has released Mellum2, a 12-billion parameter Mixture-of-Experts model that activates only 2.5 billion parameters per token. The open-source model is designed for code generation, RAG pipelines, and agent workflows with 2x faster inference than similar-sized models.
JetBrains Releases Mellum2: 12B MoE Model With 2.5B Active Parameters for Code and Text
JetBrains has released Mellum2, a 12-billion parameter Mixture-of-Experts (MoE) model trained from scratch on natural language and code. The model activates only 2.5 billion parameters per token, delivering what JetBrains claims is more than 2x faster inference compared to similar-sized models.
Mellum2 is released under the Apache 2.0 license and is available on Hugging Face.
Model Architecture and Specifications
Mellum2 uses a Mixture-of-Experts architecture that keeps total model capacity at 12 billion parameters while activating only 2.5 billion parameters for each token. According to JetBrains, this design reduces serving costs for real-time workloads while maintaining model capability.
The model is intentionally limited to text and code — it does not handle images, audio, or video. JetBrains says this specialization keeps the model compact and efficient for software engineering tasks.
Pricing has not been disclosed. The model is designed for self-hosted deployment rather than API-based access.
Target Use Cases
JetBrains positions Mellum2 as a "focal" model for high-frequency tasks inside larger AI systems:
Routing and orchestration: Prompt classification, tool selection, and control-flow operations in multi-model systems
RAG pipelines: Context compression, summarization, and retrieval post-processing for latency-sensitive applications
Sub-agents: Planning, validation, and context preparation tasks that don't require frontier models
Private deployment: Self-hosted environments with proprietary code or internal data
The company frames Mellum2 as a complement to larger models rather than a replacement, targeting workloads where inference speed and cost matter more than raw capability.
Benchmark Performance
JetBrains claims Mellum2 is "competitive with similarly sized open models" across code generation, reasoning, science, and math benchmarks. The company published a technical report with evaluation methodology but did not disclose specific benchmark scores in the announcement.
The 2x inference speed advantage is attributed to the MoE architecture's selective parameter activation.
What This Means
Mellum2 reflects a shift toward specialized, modular AI systems rather than monolithic models. JetBrains is betting that production AI systems need fast, focused models for intermediate tasks — not just large models for final outputs.
The Apache 2.0 license and focus on self-hosting suggest JetBrains is targeting enterprises with strict data privacy requirements and developers building agent systems that require multiple model calls. The 2.5B active parameter design could make Mellum2 viable for local deployment scenarios where 7B+ dense models are too slow.
The model's competitive positioning will depend on benchmark comparisons with Qwen2.5-Coder-7B, DeepSeek-Coder-6.7B, and other code-focused models in the 6-12B parameter range. JetBrains has not yet published head-to-head comparisons with specific competitors.
Related Articles
StepFun launches Step 3.7 Flash: 196B MoE model with 256K context and adjustable reasoning levels at $0.20/$1.15 per 1M
StepFun has released Step 3.7 Flash, a 196B-parameter Mixture-of-Experts model that activates approximately 11B parameters per token. The multimodal model supports a 256K context window and introduces selectable reasoning levels (high/medium/low), priced at $0.20 per 1M input tokens and $1.15 per 1M output tokens.
StepFun Releases Step-3.7-Flash: 198B-Parameter Sparse MoE Model With 256K Context in GGUF Format
StepFun has released Step-3.7-Flash, a 198B-parameter sparse Mixture-of-Experts vision-language model that activates approximately 11B parameters per token. The model supports a 256K context window, native image understanding via a 1.8B-parameter vision encoder, and offers three selectable reasoning levels.
StepFun releases Step-3.7-Flash: 198B-parameter MoE model with 256K context at $0.20/M input tokens
StepFun has released Step-3.7-Flash, a 198B-parameter sparse Mixture-of-Experts vision-language model that activates 11B parameters per token and delivers up to 400 tokens per second. The model supports a 256K context window, three selectable reasoning levels, and is priced at $0.20 per million input tokens (cache miss) and $1.15 per million output tokens.
Mistral AI Releases Small 4: 119B Parameter Open-Source Model with 256K Context Under Apache 2.0
Mistral AI has released Mistral Small 4, a 119B total parameter mixture-of-experts model with 256K context window and native multimodal capabilities. The model uses 128 experts with 4 active per token (6B active parameters) and is released under the Apache 2.0 license, marking Mistral's first unified model combining reasoning, multimodal, and coding capabilities.
Comments
Loading...