model releaseNVIDIA

Nvidia releases Nemotron 3 Super: 120B MoE model with 1M token context

TL;DR

Nvidia has released Nemotron 3 Super, a 120-billion parameter hybrid Mamba-Transformer Mixture-of-Experts model that activates only 12 billion parameters during inference. The open-weight model features a 1-million token context window, multi-token prediction capabilities, and pricing at $0.10 per million input tokens and $0.50 per million output tokens.

March 23, 2026 · 3:35 PM2 min read

Nemotron 3 Super — Quick Specs

Context window1000K tokens

Input$0.1/1M tokens

Output$0.5/1M tokens

Compare Nemotron 3 Super with other models →

Nvidia Releases Nemotron 3 Super: 120B MoE Model with 1M Context Window

Nvidia has released Nemotron 3 Super, a 120-billion parameter open-weight model designed for multi-agent applications and long-context reasoning tasks. The model activates only 12 billion parameters during inference through a hybrid Mixture-of-Experts (MoE) architecture, balancing parameter scale with computational efficiency.

Key Specifications

Model Architecture: The model combines a Mamba-Transformer hybrid backbone with Mixture-of-Experts routing and multi-token prediction (MTP). Nvidia claims this design enables "over 50% higher token generation" compared to leading open-source models, though independent benchmarks are not yet available.

Context Window: 1 million tokens—one of the largest context windows available in open-weight models, enabling document analysis, cross-reference reasoning, and extended conversation memory.

Pricing: $0.10 per million input tokens and $0.50 per million output tokens via OpenRouter. This places it in the mid-tier pricing for large models, significantly cheaper than frontier closed-source alternatives.

Latent MoE Design: The model routes queries to 4 experts but applies computational cost equivalent to activating only one. Nvidia positions this as enabling "intelligence and generalization" improvements without proportional compute overhead.

Training and Performance

Nemotron 3 Super underwent multi-environment reinforcement learning training across 10+ simulation environments. According to Nvidia, the model achieves leading accuracy on AIME 2025, TerminalBench, and SWE-Bench Verified benchmarks. Specific benchmark scores have not been disclosed.

The model was released on March 11, 2026.

Licensing and Deployment

Nvidia released the model with full weights, training datasets, and recipes under the NVIDIA Open License. This enables customization and local deployment without cloud dependencies.

Market Position

Nemotron 3 Super enters a competitive space for open-weight reasoning models. The combination of 1M context window, MoE efficiency, and sub-$1 output token pricing targets developers building agentic systems and applications requiring extended context reasoning. The model's latent MoE approach represents an alternative to dense scaling or standard sparse MoE designs, though real-world efficiency gains require vendor-specific inference optimization.

Availability appears limited to OpenRouter as a primary provider, with additional routing partners handling fallback load.

What This Means

Nvidia's release signals continued commitment to open-weight models as strategic infrastructure for AI ecosystem players. The 1M context window and sub-12B activation pattern address two key pain points: expensive long-context reasoning and compute constraints in production deployments. However, performance claims lack independent verification, and real-world token generation speedup depends heavily on inference engine optimization—not guaranteed across all providers or hardware.

Source: openrouter.ai ↗

nvidia model-release open-source mixture-of-experts large-language-model long-context reasoning moe

model releaseJune 18, 2026

Mistral Releases Mistral 3 Family: 675B-Parameter Large 3 MoE and Three Edge Models Under Apache 2.0

Mistral has released Mistral 3, including Mistral Large 3—a sparse mixture-of-experts model with 41B active and 675B total parameters—and three Ministral 3 edge models (3B, 8B, 14B). All models are released under Apache 2.0 license with multimodal capabilities and are available today on multiple platforms.

model releaseJune 21, 2026

Poolside releases Laguna M.1: 225B parameter MoE model scores 74.6% on SWE-bench Verified

Poolside has released Laguna M.1, a 225B total parameter Mixture-of-Experts model with 23B activated parameters per token, designed for agentic coding tasks. The model scores 74.6% on SWE-bench Verified and 63.1% on SWE-bench Multilingual, released under Apache 2.0 license.

model releaseJune 18, 2026

Zhipu AI releases GLM-5.2 with 1M token context and 62.1% SWE-bench Pro score

Zhipu AI released GLM-5.2, a 753 billion parameter model with a 1 million token context window. The model scores 62.1% on SWE-bench Pro and introduces IndexShare architecture that reduces per-token FLOPs by 2.9× at 1M context length. Released under MIT license with no regional restrictions.

model releaseJune 17, 2026

NVIDIA Releases Quantized DiffusionGemma 26B: 1,100+ Tokens/Second with 256K Context Window

NVIDIA released a quantized version of Google DeepMind's DiffusionGemma 26B A4B IT, a multimodal model with 25.2B total parameters (3.8B active) that processes text, image, and video inputs. The NVFP4-quantized model achieves generation speeds exceeding 1,100 tokens per second on NVIDIA H100 GPUs while supporting a 256K token context window.