Gemma 4 26B A4B Assistant

Google DeepMind🇺🇸 United States
active
Context window256K tokens

Version History

1.0major

Initial release of Multi-Token Prediction assistant model for Gemma 4 26B A4B, enabling up to 2x inference speedup through speculative decoding while maintaining identical output quality.

Coverage

model releaseGoogle DeepMind

Google DeepMind Releases Gemma 4 26B A4B Assistant Model for 2x Faster Inference via Multi-Token Prediction

Google DeepMind has released a Multi-Token Prediction assistant model for Gemma 4 26B A4B that achieves up to 2x decoding speedup through speculative decoding. The model uses 3.8B active parameters from a 25.2B total parameter MoE architecture with 128 experts and a 256K token context window.

2 min read