DiffusionGemma 26B A4B IT NVFP4

Name: DiffusionGemma 26B A4B IT NVFP4
Author: NVIDIA

NVIDIA🇺🇸 United States

active

Compare with other models →

Context window262K tokens

Version History

NVFP4snapshotJune 17, 2026

NVIDIA quantized Google DeepMind's DiffusionGemma 26B A4B IT from 16-bit to 4-bit (NVFP4) using Model Optimizer, reducing memory requirements while maintaining benchmark performance within 1% of the full-precision baseline.

Benchmark Scores

Full leaderboard →

67.3%

AIME 2025

95.0%

HumanEval

Coverage

model releaseGoogle DeepMind

NVIDIA Releases Quantized DiffusionGemma 26B: 1,100+ Tokens/Second with 256K Context Window

NVIDIA released a quantized version of Google DeepMind's DiffusionGemma 26B A4B IT, a multimodal model with 25.2B total parameters (3.8B active) that processes text, image, and video inputs. The NVFP4-quantized model achieves generation speeds exceeding 1,100 tokens per second on NVIDIA H100 GPUs while supporting a 256K token context window.

June 17, 2026 · 12:06 PM2 min read

NVIDIA Google DeepMind DiffusionGemma