NVIDIA releases Nemotron-3-Super-120B, a 120B parameter model with latent MoE architecture
NVIDIA has released Nemotron-3-Super-120B-A12B-NVFP4, a 120-billion parameter text generation model featuring a latent Mixture-of-Experts (MoE) architecture. The model supports 8 languages including English, French, Spanish, Italian, German, Japanese, and Chinese, and is available on Hugging Face with 8-bit quantization support through NVIDIA's ModelOpt toolkit.
Nemotron 3 Super — Quick Specs
NVIDIA has released Nemotron-3-Super-120B-A12B-NVFP4, a 120-billion parameter model designed for conversational text generation with support for multiple languages.
Model Specifications
The model implements a latent Mixture-of-Experts (MoE) architecture, which reduces computational overhead during inference compared to standard dense models of equivalent scale. The NVFP4 designation indicates the model uses NVIDIA's custom floating-point quantization format.
Language support includes English, French, Spanish, Italian, German, Japanese, and Chinese, positioning the model for multilingual deployment scenarios.
Training and Development
According to NVIDIA's documentation, the model was trained using the Nemotron post-training (v3) and pre-training datasets. Two research papers accompany the release: arxiv:2512.20848 and arxiv:2512.20856, providing technical details on the training methodology and architecture decisions.
The model card indicates training data cutoff in December 2025.
Technical Capabilities
Nemotron-3-Super-120B is optimized for conversational tasks and text generation workloads. The model supports 8-bit quantization through NVIDIA's ModelOpt optimization toolkit, enabling deployment on hardware with reduced memory requirements. It is compatible with standard transformer inference endpoints.
The model is released under a custom license (non-standard open-source terms), requiring review of NVIDIA's specific licensing terms before use.
Availability
The model is available on Hugging Face with 11,460+ downloads as of the release date and 73 likes from the community. It is compatible with the transformers library using safetensors format and includes custom code components.
NVIDIA has marked the model as US-region eligible, suggesting potential regulatory or distribution considerations.
What This Means
Nemotron-3-Super-120B represents NVIDIA's continued push into open-weight model releases, competing directly with Meta's Llama series and other 120B-scale models. The latent MoE architecture is noteworthy—by concentrating sparsity in the architecture rather than parameter count, NVIDIA claims efficiency gains without sacrificing model capacity. For organizations evaluating 120B-scale models, the multilingual support and native NVIDIA optimization tooling make this a viable alternative to existing open-weight options, though the custom license requires careful legal review for commercial use.
Related Articles
DeepSeek V4 Pro launches with 1.6 trillion parameters, 1M token context at $0.145 per million input tokens
Chinese AI lab DeepSeek has released preview versions of DeepSeek V4 Flash and V4 Pro, mixture-of-experts models with 1 million token context windows. The V4 Pro has 1.6 trillion total parameters (49 billion active), making it the largest open-weight model available, while both models significantly undercut frontier model pricing.
DeepSeek V4 Pro launches with 1.6T parameters at $1.74/M tokens, undercutting Claude Sonnet 4.6 by 42%
DeepSeek released two preview models: V4 Pro (1.6T total parameters, 49B active) and V4 Flash (284B total, 13B active), both with 1 million token context windows. V4 Pro is priced at $1.74/M input tokens and $3.48/M output—42% cheaper than Claude Sonnet 4.6—while V4 Flash at $0.14/$0.28 per million tokens undercuts all small frontier models.
Xiaomi Releases MiMo-V2.5-Pro: 1.02T Parameter MoE Model with 1M Context Window
Xiaomi has released MiMo-V2.5-Pro, an open-source Mixture-of-Experts model with 1.02 trillion total parameters and 42 billion active parameters. The model supports up to 1 million tokens context length and claims 99.6% on GSM8K and 86.2% on MATH benchmarks.
OpenAI Launches GPT Mini Latest with 400,000 Token Context Window
OpenAI released GPT Mini Latest on April 27, 2025, featuring a 400,000 token context window. The model automatically redirects to the latest version in the OpenAI GPT Mini family, allowing developers to stay current without manual updates.
Comments
Loading...