Gemma 4 31B IT NVFP4

Name: Gemma 4 31B IT NVFP4
Author: NVIDIA

NVIDIA🇺🇸 United States

active

Compare with other models →

Context window262K tokens

Version History

v1.0majorApril 2, 2026

NVIDIA released NVFP4-quantized version of Google DeepMind's Gemma 4 31B IT model optimized for consumer GPU inference. Maintains 256K context window and multimodal capabilities with <0.5% performance degradation on reasoning and coding benchmarks.

Benchmark Scores

Full leaderboard →

65.9%

AIME 2025

75.5%

GPQA

70.6%

LiveCodeBench

84.9%

MMLU-Pro

Coverage

model releaseGoogle DeepMind

NVIDIA releases Gemma 4 31B quantized model with 256K context, multimodal capabilities

NVIDIA has released a quantized version of Google DeepMind's Gemma 4 31B IT model, compressed to NVFP4 format for efficient inference on consumer GPUs. The 30.7B-parameter multimodal model supports 256K token context windows, handles text and image inputs with video frame processing, and maintains near-baseline performance across reasoning and coding benchmarks.

April 4, 2026 · 5:50 AM2 min read

gemma nvidia quantization