DiffusionGemma 26B A4B IT NVFP4

NVIDIA🇺🇸 United States
active
Context window262K tokens

Version History

NVFP4snapshot

NVIDIA quantized Google DeepMind's DiffusionGemma 26B A4B IT from 16-bit to 4-bit (NVFP4) using Model Optimizer, reducing memory requirements while maintaining benchmark performance within 1% of the full-precision baseline.

Benchmark Scores

Full leaderboard →
67.3%
AIME 2025
95.0%
HumanEval

Coverage

model releaseGoogle DeepMind

NVIDIA Releases Quantized DiffusionGemma 26B: 1,100+ Tokens/Second with 256K Context Window

NVIDIA released a quantized version of Google DeepMind's DiffusionGemma 26B A4B IT, a multimodal model with 25.2B total parameters (3.8B active) that processes text, image, and video inputs. The NVFP4-quantized model achieves generation speeds exceeding 1,100 tokens per second on NVIDIA H100 GPUs while supporting a 256K token context window.

2 min read