Nemotron-Labs Diffusion 8B

NVIDIA🇺🇸 United States
active

Version History

1.0major

Initial release of diffusion language model family trained on 1.3T pretraining tokens and 45B fine-tuning tokens. Supports autoregressive, diffusion, and self-speculation generation modes with up to 6.4× speedup over traditional AR models.

Coverage

researchNVIDIA

NVIDIA Releases Nemotron-Labs Diffusion Models With 6.4× Faster Token Generation Than Autoregressive Decoding

NVIDIA has released Nemotron-Labs Diffusion, a family of diffusion language models at 3B, 8B, and 14B scales that generate multiple tokens in parallel rather than one at a time. The 8B model achieves 6.4× higher tokens per forward pass than autoregressive models in self-speculation mode while maintaining comparable accuracy.

2 min read