NVIDIA releases Nemotron-3-Super-120B, a 120B parameter model with latent MoE architecture
NVIDIA has released Nemotron-3-Super-120B-A12B-BF16, a 120 billion parameter model designed for text generation and conversational tasks. The model employs a latent mixture-of-experts (MoE) architecture and supports multiple languages including English, French, Spanish, Italian, German, Japanese, and Chinese.
NVIDIA Nemotron-3-Super-120B-A12B — Quick Specs
NVIDIA has released Nemotron-3-Super-120B-A12B-BF16, a 120 billion parameter text generation model now available on Hugging Face. The model represents NVIDIA's latest entry in the Nemotron-3 family and uses a latent mixture-of-experts architecture optimized for inference efficiency.
Model Specifications
The Nemotron-3-Super-120B-A12B variant is distributed in BF16 (bfloat16) precision format. The model uses a latent MoE design, which dynamically routes tokens to specialized expert networks rather than using all parameters for every computation. This architectural approach typically reduces computational requirements during inference compared to dense 120B models of equivalent capability.
The model supports text generation and conversational workloads across eight languages: English, French, Spanish, Italian, German, Japanese, and Chinese. NVIDIA trained the model using its Nemotron post-training and pre-training datasets, with technical details available in two research papers (arXiv:2512.20848 and arXiv:2512.20856).
Training and Architecture
Nemotron-3-Super-120B incorporates multi-token prediction (MTP), a training technique that improves model efficiency by predicting multiple tokens simultaneously during generation. The model is compatible with the Hugging Face Transformers library and supports safetensors format for efficient model loading.
As of March 10, 2026, the model has received 70 likes on Hugging Face and 22 downloads. The release is restricted under NVIDIA's custom license, and the model includes endpoints compatibility for inference services.
What This Means
NVIDIA's release of a 120B parameter model with latent MoE architecture signals continued focus on efficient large-model serving. The combination of MoE routing and multi-token prediction suggests the model is optimized for throughput and latency—critical factors for production deployments where serving dense 120B models can be computationally expensive. The multi-language support positions the model for international use cases, though without public benchmarks or performance comparisons, relative capability versus competing 120B models remains unclear.
Related Articles
Nvidia releases Nemotron 3 Ultra: 550B-parameter MoE model with 1M context window for agentic workflows
Nvidia has released Nemotron 3 Ultra, a 550-billion parameter mixture-of-experts model with 55 billion active parameters and support for up to 1 million token context windows. The model uses a hybrid Transformer-Mamba architecture and is designed specifically for long-running agentic workflows including agent orchestration, coding agents, and complex enterprise tasks.
Moonshot AI releases Kimi K2.7 Code with 1T parameters, 256K context window, 30% lower thinking token usage
Moonshot AI has released Kimi K2.7 Code, a 1 trillion parameter Mixture-of-Experts model designed for long-horizon coding tasks. The model features a 256K context window and reduces thinking token usage by approximately 30% compared to its predecessor K2.6.
Nex AGI Releases Nex-N2-Pro: 17B Active Parameter MoE Model with 262K Context Window
Nex AGI has released Nex-N2-Pro, a mixture-of-experts model with 17 billion active parameters from a total of 397 billion parameters. Built on the Qwen3.5 architecture, the model offers a 262,144 token context window and is available for free through OpenRouter.
Nex AGI Releases Nex-N2-Pro: 397B Parameter MoE Model With 262K Context, Available Free
Nex AGI has released Nex-N2-Pro, an agentic mixture-of-experts model with 397B total parameters and 17B active parameters. The model features a 262K token context window and is available free via OpenRouter's API.
Comments
Loading...