Nemotron-3-Ultra-550B-A55B

NVIDIA🇺🇸 United States
active
Context window1000K tokens

Version History

550B-A55B-BF16major

Initial release of Nemotron-3-Ultra, a 550B parameter model trained December 2025-April 2026 with hybrid LatentMoE architecture, 1M token context, and configurable reasoning capabilities.

Benchmark Scores

Full leaderboard →
87.0%
GPQA
86.8%
MMLU-Pro

Coverage

model releaseNVIDIA

NVIDIA releases Nemotron-3-Ultra: 550B parameter model with 1M token context and configurable reasoning

NVIDIA released Nemotron-3-Ultra-550B, a frontier-scale model with 550B total parameters (55B active) and up to 1M token context window. The model uses a hybrid LatentMoE architecture combining Mamba-2, MoE, and attention layers with Multi-Token Prediction, trained with NVFP4 quantization-aware methods from December 2025 to April 2026.

2 min read