model releaseNVIDIA

NVIDIA releases Nemotron-3-Super-120B, a 120B parameter model with latent MoE architecture

TL;DR

NVIDIA has released Nemotron-3-Super-120B-A12B-NVFP4, a 120-billion parameter text generation model featuring a latent Mixture-of-Experts (MoE) architecture. The model supports 8 languages including English, French, Spanish, Italian, German, Japanese, and Chinese, and is available on Hugging Face with 8-bit quantization support through NVIDIA's ModelOpt toolkit.

March 12, 2026 · 11:35 AM2 min read

Nemotron 3 Super — Quick Specs

Context window1000K tokens

Input$0.1/1M tokens

Output$0.5/1M tokens

Compare Nemotron 3 Super with other models →

NVIDIA has released Nemotron-3-Super-120B-A12B-NVFP4, a 120-billion parameter model designed for conversational text generation with support for multiple languages.

Model Specifications

The model implements a latent Mixture-of-Experts (MoE) architecture, which reduces computational overhead during inference compared to standard dense models of equivalent scale. The NVFP4 designation indicates the model uses NVIDIA's custom floating-point quantization format.

Language support includes English, French, Spanish, Italian, German, Japanese, and Chinese, positioning the model for multilingual deployment scenarios.

Training and Development

According to NVIDIA's documentation, the model was trained using the Nemotron post-training (v3) and pre-training datasets. Two research papers accompany the release: arxiv:2512.20848 and arxiv:2512.20856, providing technical details on the training methodology and architecture decisions.

The model card indicates training data cutoff in December 2025.

Technical Capabilities

Nemotron-3-Super-120B is optimized for conversational tasks and text generation workloads. The model supports 8-bit quantization through NVIDIA's ModelOpt optimization toolkit, enabling deployment on hardware with reduced memory requirements. It is compatible with standard transformer inference endpoints.

The model is released under a custom license (non-standard open-source terms), requiring review of NVIDIA's specific licensing terms before use.

Availability

The model is available on Hugging Face with 11,460+ downloads as of the release date and 73 likes from the community. It is compatible with the transformers library using safetensors format and includes custom code components.

NVIDIA has marked the model as US-region eligible, suggesting potential regulatory or distribution considerations.

What This Means

Nemotron-3-Super-120B represents NVIDIA's continued push into open-weight model releases, competing directly with Meta's Llama series and other 120B-scale models. The latent MoE architecture is noteworthy—by concentrating sparsity in the architecture rather than parameter count, NVIDIA claims efficiency gains without sacrificing model capacity. For organizations evaluating 120B-scale models, the multilingual support and native NVIDIA optimization tooling make this a viable alternative to existing open-weight options, though the custom license requires careful legal review for commercial use.

Source: huggingface.co ↗

nvidia model-release 120b-parameters mixture-of-experts text-generation multilingual latent-moe quantization

model releaseApril 24, 2026

DeepSeek V4 Pro launches with 1.6 trillion parameters, 1M token context at $0.145 per million input tokens

Chinese AI lab DeepSeek has released preview versions of DeepSeek V4 Flash and V4 Pro, mixture-of-experts models with 1 million token context windows. The V4 Pro has 1.6 trillion total parameters (49 billion active), making it the largest open-weight model available, while both models significantly undercut frontier model pricing.

model releaseApril 24, 2026

DeepSeek V4 Pro launches with 1.6T parameters at $1.74/M tokens, undercutting Claude Sonnet 4.6 by 42%

DeepSeek released two preview models: V4 Pro (1.6T total parameters, 49B active) and V4 Flash (284B total, 13B active), both with 1 million token context windows. V4 Pro is priced at $1.74/M input tokens and $3.48/M output—42% cheaper than Claude Sonnet 4.6—while V4 Flash at $0.14/$0.28 per million tokens undercuts all small frontier models.

model releaseApril 27, 2026

Xiaomi Releases MiMo-V2.5-Pro: 1.02T Parameter MoE Model with 1M Context Window

Xiaomi has released MiMo-V2.5-Pro, an open-source Mixture-of-Experts model with 1.02 trillion total parameters and 42 billion active parameters. The model supports up to 1 million tokens context length and claims 99.6% on GSM8K and 86.2% on MATH benchmarks.