Gemma 4 26B A4B Assistant

Name: Gemma 4 26B A4B Assistant
Author: Google DeepMind

Google DeepMind🇺🇸 United States

active

Compare with other models →

Context window256K tokens

Version History

1.0majorMay 6, 2026

Initial release of Multi-Token Prediction assistant model for Gemma 4 26B A4B, enabling up to 2x inference speedup through speculative decoding while maintaining identical output quality.

Coverage

model releaseGoogle DeepMind

Google DeepMind Releases Gemma 4 26B A4B Assistant Model for 2x Faster Inference via Multi-Token Prediction

Google DeepMind has released a Multi-Token Prediction assistant model for Gemma 4 26B A4B that achieves up to 2x decoding speedup through speculative decoding. The model uses 3.8B active parameters from a 25.2B total parameter MoE architecture with 128 experts and a 256K token context window.

May 6, 2026 · 3:06 PM2 min read

gemma-4 google-deepmind mixture-of-experts