Gemma 4 E4B Assistant

Name: Gemma 4 E4B Assistant
Author: Google DeepMind

Google DeepMind🇺🇸 United States

active

Compare with other models →

Context window128K tokens

Version History

4-e4bmajorMay 10, 2026

Gemma 4 E4B assistant introduces Multi-Token Prediction architecture for speculative decoding, achieving up to 2x inference speedup. Features 4.5B effective parameters with Per-Layer Embeddings optimized for on-device deployment.

Coverage

model releaseGoogle DeepMind

Google DeepMind Releases Gemma 4 E4B with Multi-Token Prediction for 2x Faster Inference

Google DeepMind released the Gemma 4 E4B assistant model using Multi-Token Prediction (MTP) architecture that accelerates inference by up to 2x through speculative decoding. The 4.5B effective parameter model supports 128K context windows and handles text, image, and audio input with pricing not yet disclosed.

May 10, 2026 · 5:06 AM3 min read

Gemma Google DeepMind multimodal