MTP
1 article tagged with MTP
May 10, 2026
model releaseGoogle DeepMind
Google DeepMind Releases Gemma 4 E4B with Multi-Token Prediction for 2x Faster Inference
Google DeepMind released the Gemma 4 E4B assistant model using Multi-Token Prediction (MTP) architecture that accelerates inference by up to 2x through speculative decoding. The 4.5B effective parameter model supports 128K context windows and handles text, image, and audio input with pricing not yet disclosed.