model releaseNVIDIA

NVIDIA releases Nemotron-3-Super-120B, a 120B parameter model with latent MoE architecture

NVIDIA has released Nemotron-3-Super-120B-A12B-NVFP4, a 120-billion parameter text generation model featuring a latent Mixture-of-Experts (MoE) architecture. The model supports 8 languages including English, French, Spanish, Italian, German, Japanese, and Chinese, and is available on Hugging Face with 8-bit quantization support through NVIDIA's ModelOpt toolkit.

2 min read

NVIDIA has released Nemotron-3-Super-120B-A12B-NVFP4, a 120-billion parameter model designed for conversational text generation with support for multiple languages.

Model Specifications

The model implements a latent Mixture-of-Experts (MoE) architecture, which reduces computational overhead during inference compared to standard dense models of equivalent scale. The NVFP4 designation indicates the model uses NVIDIA's custom floating-point quantization format.

Language support includes English, French, Spanish, Italian, German, Japanese, and Chinese, positioning the model for multilingual deployment scenarios.

Training and Development

According to NVIDIA's documentation, the model was trained using the Nemotron post-training (v3) and pre-training datasets. Two research papers accompany the release: arxiv:2512.20848 and arxiv:2512.20856, providing technical details on the training methodology and architecture decisions.

The model card indicates training data cutoff in December 2025.

Technical Capabilities

Nemotron-3-Super-120B is optimized for conversational tasks and text generation workloads. The model supports 8-bit quantization through NVIDIA's ModelOpt optimization toolkit, enabling deployment on hardware with reduced memory requirements. It is compatible with standard transformer inference endpoints.

The model is released under a custom license (non-standard open-source terms), requiring review of NVIDIA's specific licensing terms before use.

Availability

The model is available on Hugging Face with 11,460+ downloads as of the release date and 73 likes from the community. It is compatible with the transformers library using safetensors format and includes custom code components.

NVIDIA has marked the model as US-region eligible, suggesting potential regulatory or distribution considerations.

What This Means

Nemotron-3-Super-120B represents NVIDIA's continued push into open-weight model releases, competing directly with Meta's Llama series and other 120B-scale models. The latent MoE architecture is noteworthy—by concentrating sparsity in the architecture rather than parameter count, NVIDIA claims efficiency gains without sacrificing model capacity. For organizations evaluating 120B-scale models, the multilingual support and native NVIDIA optimization tooling make this a viable alternative to existing open-weight options, though the custom license requires careful legal review for commercial use.