model releaseNVIDIA

Nvidia releases Nemotron 3 Super: 120B MoE model with 1M token context

TL;DR

Nvidia has released Nemotron 3 Super, a 120-billion parameter hybrid Mamba-Transformer Mixture-of-Experts model that activates only 12 billion parameters during inference. The open-weight model features a 1-million token context window, multi-token prediction capabilities, and pricing at $0.10 per million input tokens and $0.50 per million output tokens.

2 min read

Nemotron 3 Super — Quick Specs

Context window1000K tokens
Input$0.1/1M tokens
Output$0.5/1M tokens

Nvidia Releases Nemotron 3 Super: 120B MoE Model with 1M Context Window

Nvidia has released Nemotron 3 Super, a 120-billion parameter open-weight model designed for multi-agent applications and long-context reasoning tasks. The model activates only 12 billion parameters during inference through a hybrid Mixture-of-Experts (MoE) architecture, balancing parameter scale with computational efficiency.

Key Specifications

Model Architecture: The model combines a Mamba-Transformer hybrid backbone with Mixture-of-Experts routing and multi-token prediction (MTP). Nvidia claims this design enables "over 50% higher token generation" compared to leading open-source models, though independent benchmarks are not yet available.

Context Window: 1 million tokens—one of the largest context windows available in open-weight models, enabling document analysis, cross-reference reasoning, and extended conversation memory.

Pricing: $0.10 per million input tokens and $0.50 per million output tokens via OpenRouter. This places it in the mid-tier pricing for large models, significantly cheaper than frontier closed-source alternatives.

Latent MoE Design: The model routes queries to 4 experts but applies computational cost equivalent to activating only one. Nvidia positions this as enabling "intelligence and generalization" improvements without proportional compute overhead.

Training and Performance

Nemotron 3 Super underwent multi-environment reinforcement learning training across 10+ simulation environments. According to Nvidia, the model achieves leading accuracy on AIME 2025, TerminalBench, and SWE-Bench Verified benchmarks. Specific benchmark scores have not been disclosed.

The model was released on March 11, 2026.

Licensing and Deployment

Nvidia released the model with full weights, training datasets, and recipes under the NVIDIA Open License. This enables customization and local deployment without cloud dependencies.

Market Position

Nemotron 3 Super enters a competitive space for open-weight reasoning models. The combination of 1M context window, MoE efficiency, and sub-$1 output token pricing targets developers building agentic systems and applications requiring extended context reasoning. The model's latent MoE approach represents an alternative to dense scaling or standard sparse MoE designs, though real-world efficiency gains require vendor-specific inference optimization.

Availability appears limited to OpenRouter as a primary provider, with additional routing partners handling fallback load.

What This Means

Nvidia's release signals continued commitment to open-weight models as strategic infrastructure for AI ecosystem players. The 1M context window and sub-12B activation pattern address two key pain points: expensive long-context reasoning and compute constraints in production deployments. However, performance claims lack independent verification, and real-world token generation speedup depends heavily on inference engine optimization—not guaranteed across all providers or hardware.

Related Articles

model release

NVIDIA releases Nemotron-3-Super-120B, a 120B parameter model with latent MoE architecture

NVIDIA has released Nemotron-3-Super-120B-A12B-BF16, a 120 billion parameter model designed for text generation and conversational tasks. The model employs a latent mixture-of-experts (MoE) architecture and supports multiple languages including English, French, Spanish, Italian, German, Japanese, and Chinese.

model release

Rakuten releases RakutenAI-3.0, 671B-parameter Japanese-optimized mixture-of-experts model

Rakuten Group has released RakutenAI-3.0, a 671 billion parameter mixture-of-experts (MoE) model designed specifically for Japanese language tasks. The model activates 37 billion parameters per token and supports a 128K context window. It is available under the Apache License 2.0 on Hugging Face.

product update

NVIDIA Nemotron 3 Super now available on Amazon Bedrock with 256K context window

NVIDIA Nemotron 3 Super, a hybrid Mixture of Experts model with 120B parameters and 12B active parameters, is now available as a fully managed model on Amazon Bedrock. The model supports up to 256K token context length and claims 5x higher throughput efficiency over the previous Nemotron Super and 2x higher accuracy on reasoning tasks.

model release

NVIDIA releases Nemotron-3-Super-120B, a 120B parameter model with latent MoE architecture

NVIDIA has released Nemotron-3-Super-120B-A12B-NVFP4, a 120-billion parameter text generation model featuring a latent Mixture-of-Experts (MoE) architecture. The model supports 8 languages including English, French, Spanish, Italian, German, Japanese, and Chinese, and is available on Hugging Face with 8-bit quantization support through NVIDIA's ModelOpt toolkit.