Gemma 4 E4B Assistant

Google DeepMind🇺🇸 United States
active
Context window128K tokens

Version History

4-e4bmajor

Gemma 4 E4B assistant introduces Multi-Token Prediction architecture for speculative decoding, achieving up to 2x inference speedup. Features 4.5B effective parameters with Per-Layer Embeddings optimized for on-device deployment.

Coverage