Gemma 4 31B IT Assistant (MTP Drafter)

Google DeepMind🇺🇸 United States
active
Context window256K tokens

Version History

4major

Major release introducing Gemma 4 family with four model sizes (E2B, E4B, 26B A4B MoE, 31B dense), Multi-Token Prediction drafters for 2x speedup, extended context windows up to 256K, enhanced multimodal capabilities, and improved reasoning performance.

Coverage

model releaseGoogle DeepMind

Google DeepMind releases Gemma 4 with 31B dense model, 256K context window, and speculative decoding drafters

Google DeepMind has released Gemma 4, a family of open-weight multimodal models including a 31B dense model with 256K context window and four size variants ranging from 2.3B to 30.7B effective parameters. The release includes Multi-Token Prediction (MTP) draft models that achieve up to 2x decoding speedup through speculative decoding while maintaining identical output quality.

3 min read