Nemotron-Labs Diffusion 8B

Name: Nemotron-Labs Diffusion 8B
Author: NVIDIA

NVIDIA🇺🇸 United States

active

Compare with other models →

Version History

1.0majorMay 23, 2026

Initial release of diffusion language model family trained on 1.3T pretraining tokens and 45B fine-tuning tokens. Supports autoregressive, diffusion, and self-speculation generation modes with up to 6.4× speedup over traditional AR models.

Benchmark Scores

Full leaderboard →

865.0 tokens_per_sec

Speed (tok/s)

Coverage

researchNVIDIA

NVIDIA Releases Nemotron-Labs Diffusion Models With 6.4× Faster Token Generation Than Autoregressive Decoding

NVIDIA has released Nemotron-Labs Diffusion, a family of diffusion language models at 3B, 8B, and 14B scales that generate multiple tokens in parallel rather than one at a time. The 8B model achieves 6.4× higher tokens per forward pass than autoregressive models in self-speculation mode while maintaining comparable accuracy.

May 23, 2026 · 12:21 AM2 min read

nvidia diffusion-models inference-optimization