model releaseNVIDIA

Nvidia releases Nemotron 3 Super: 120B MoE model with 1M token context

TL;DR

Nvidia has released Nemotron 3 Super, a 120-billion parameter hybrid Mamba-Transformer Mixture-of-Experts model that activates only 12 billion parameters during inference. The open-weight model features a 1-million token context window, multi-token prediction capabilities, and pricing at $0.10 per million input tokens and $0.50 per million output tokens.

2 min read
1

Nemotron 3 Super — Quick Specs

Context window1000K tokens
Input$0.1/1M tokens
Output$0.5/1M tokens

Nvidia Releases Nemotron 3 Super: 120B MoE Model with 1M Context Window

Nvidia has released Nemotron 3 Super, a 120-billion parameter open-weight model designed for multi-agent applications and long-context reasoning tasks. The model activates only 12 billion parameters during inference through a hybrid Mixture-of-Experts (MoE) architecture, balancing parameter scale with computational efficiency.

Key Specifications

Model Architecture: The model combines a Mamba-Transformer hybrid backbone with Mixture-of-Experts routing and multi-token prediction (MTP). Nvidia claims this design enables "over 50% higher token generation" compared to leading open-source models, though independent benchmarks are not yet available.

Context Window: 1 million tokens—one of the largest context windows available in open-weight models, enabling document analysis, cross-reference reasoning, and extended conversation memory.

Pricing: $0.10 per million input tokens and $0.50 per million output tokens via OpenRouter. This places it in the mid-tier pricing for large models, significantly cheaper than frontier closed-source alternatives.

Latent MoE Design: The model routes queries to 4 experts but applies computational cost equivalent to activating only one. Nvidia positions this as enabling "intelligence and generalization" improvements without proportional compute overhead.

Training and Performance

Nemotron 3 Super underwent multi-environment reinforcement learning training across 10+ simulation environments. According to Nvidia, the model achieves leading accuracy on AIME 2025, TerminalBench, and SWE-Bench Verified benchmarks. Specific benchmark scores have not been disclosed.

The model was released on March 11, 2026.

Licensing and Deployment

Nvidia released the model with full weights, training datasets, and recipes under the NVIDIA Open License. This enables customization and local deployment without cloud dependencies.

Market Position

Nemotron 3 Super enters a competitive space for open-weight reasoning models. The combination of 1M context window, MoE efficiency, and sub-$1 output token pricing targets developers building agentic systems and applications requiring extended context reasoning. The model's latent MoE approach represents an alternative to dense scaling or standard sparse MoE designs, though real-world efficiency gains require vendor-specific inference optimization.

Availability appears limited to OpenRouter as a primary provider, with additional routing partners handling fallback load.

What This Means

Nvidia's release signals continued commitment to open-weight models as strategic infrastructure for AI ecosystem players. The 1M context window and sub-12B activation pattern address two key pain points: expensive long-context reasoning and compute constraints in production deployments. However, performance claims lack independent verification, and real-world token generation speedup depends heavily on inference engine optimization—not guaranteed across all providers or hardware.

Related Articles

model release

Zyphra Releases ZAYA1-8B: 8.4B Parameter MoE Model with 760M Active Parameters Matches 80B+ Models on Math Benchmarks

Zyphra has released ZAYA1-8B, a mixture-of-experts language model with 760M active parameters and 8.4B total parameters. The model scores 89.1% on AIME 2026, competitive with models exceeding 100B parameters, while maintaining efficiency for on-device deployment.

model release

NVIDIA releases Nemotron-3-Nano-Omni-30B, a 31B-parameter multimodal model with 256K context and reasoning mode

NVIDIA released Nemotron-3-Nano-Omni-30B-A3B, a multimodal large language model with 31 billion parameters that processes video, audio, images, and text with up to 256K token context. The model uses a Mamba2-Transformer hybrid Mixture of Experts architecture and supports chain-of-thought reasoning mode.

model release

Tencent Releases Hy3 Preview: Mixture-of-Experts Model with 262K Context and Configurable Reasoning

Tencent has released Hy3 preview, a Mixture-of-Experts model with a 262,144 token context window priced at $0.066 per million input tokens and $0.26 per million output tokens. The model features three configurable reasoning modes—disabled, low, and high—designed for agentic workflows and production environments.

model release

Allen Institute releases EMO, 14B parameter MoE model with selective 12.5% expert use

Allen Institute for AI released EMO, a 1B-active, 14B-total-parameter mixture-of-experts model trained on 1 trillion tokens. The model uses 8 active experts per token from a pool of 128 total experts, and can maintain near full-model performance while using just 12.5% of its experts for specific tasks.

Comments

Loading...