model releaseNVIDIA

NVIDIA releases Nemotron-3-Super-120B, a 120B parameter model with latent MoE architecture

TL;DR

NVIDIA has released Nemotron-3-Super-120B-A12B-NVFP4, a 120-billion parameter text generation model featuring a latent Mixture-of-Experts (MoE) architecture. The model supports 8 languages including English, French, Spanish, Italian, German, Japanese, and Chinese, and is available on Hugging Face with 8-bit quantization support through NVIDIA's ModelOpt toolkit.

2 min read
0

Nemotron 3 Super — Quick Specs

Context window1000K tokens
Input$0.1/1M tokens
Output$0.5/1M tokens

NVIDIA has released Nemotron-3-Super-120B-A12B-NVFP4, a 120-billion parameter model designed for conversational text generation with support for multiple languages.

Model Specifications

The model implements a latent Mixture-of-Experts (MoE) architecture, which reduces computational overhead during inference compared to standard dense models of equivalent scale. The NVFP4 designation indicates the model uses NVIDIA's custom floating-point quantization format.

Language support includes English, French, Spanish, Italian, German, Japanese, and Chinese, positioning the model for multilingual deployment scenarios.

Training and Development

According to NVIDIA's documentation, the model was trained using the Nemotron post-training (v3) and pre-training datasets. Two research papers accompany the release: arxiv:2512.20848 and arxiv:2512.20856, providing technical details on the training methodology and architecture decisions.

The model card indicates training data cutoff in December 2025.

Technical Capabilities

Nemotron-3-Super-120B is optimized for conversational tasks and text generation workloads. The model supports 8-bit quantization through NVIDIA's ModelOpt optimization toolkit, enabling deployment on hardware with reduced memory requirements. It is compatible with standard transformer inference endpoints.

The model is released under a custom license (non-standard open-source terms), requiring review of NVIDIA's specific licensing terms before use.

Availability

The model is available on Hugging Face with 11,460+ downloads as of the release date and 73 likes from the community. It is compatible with the transformers library using safetensors format and includes custom code components.

NVIDIA has marked the model as US-region eligible, suggesting potential regulatory or distribution considerations.

What This Means

Nemotron-3-Super-120B represents NVIDIA's continued push into open-weight model releases, competing directly with Meta's Llama series and other 120B-scale models. The latent MoE architecture is noteworthy—by concentrating sparsity in the architecture rather than parameter count, NVIDIA claims efficiency gains without sacrificing model capacity. For organizations evaluating 120B-scale models, the multilingual support and native NVIDIA optimization tooling make this a viable alternative to existing open-weight options, though the custom license requires careful legal review for commercial use.

Related Articles

model release

Nvidia releases Nemotron 3 Ultra: 550B-parameter MoE model with 1M context window for agentic workflows

Nvidia has released Nemotron 3 Ultra, a 550-billion parameter mixture-of-experts model with 55 billion active parameters and support for up to 1 million token context windows. The model uses a hybrid Transformer-Mamba architecture and is designed specifically for long-running agentic workflows including agent orchestration, coding agents, and complex enterprise tasks.

model release

MiniMax Releases M3: 428B-Parameter Multimodal Model with 1M Context Window and 15× Decode Speedup

MiniMax has released M3, a multimodal model with approximately 428 billion parameters and 23 billion activated parameters. The model supports a 1 million token context window and uses MiniMax Sparse Attention to achieve 9× prefill and 15× decode speedups compared to its predecessor M2.

model release

Moonshot AI releases Kimi K2.7 Code with 1T parameters, 256K context window, 30% lower thinking token usage

Moonshot AI has released Kimi K2.7 Code, a 1 trillion parameter Mixture-of-Experts model designed for long-horizon coding tasks. The model features a 256K context window and reduces thinking token usage by approximately 30% compared to its predecessor K2.6.

model release

Apple releases AFM 3 lineup: 20B-parameter on-device model and cloud AI running on Google's Nvidia infrastructure

Apple announced five third-generation foundation models at WWDC26, headlined by AFM 3 Core Advanced—a 20-billion-parameter sparse model that runs on-device by activating only 1-4 billion parameters at a time. For the first time, Apple extended Private Cloud Compute to third-party infrastructure, with AFM 3 Cloud Pro running on Nvidia GPUs in Google Cloud.

Comments

Loading...