Google DeepMind releases Gemma 4 with 4 model sizes, 256K context, and multimodal reasoning
Google DeepMind released Gemma 4, a family of open-weights multimodal models in four sizes: E2B (2.3B effective), E4B (4.5B effective), 26B A4B (3.8B active), and 31B (30.7B parameters). All models support text and image input with 128K-256K context windows, while E2B and E4B add native audio capabilities and reasoning modes across 140+ languages.
Gemma 4 E2B Instruction-Tuned — Quick Specs
Google DeepMind Releases Gemma 4: Four Open-Weights Models with Multimodal and Reasoning Capabilities
Google DeepMind released Gemma 4, an open-weights model family spanning four sizes optimized for deployment from mobile devices to high-end servers. The release includes both dense and Mixture-of-Experts variants under the Apache 2.0 license.
Model Specifications
The Gemma 4 family comprises:
- E2B: 2.3B effective parameters (5.1B with embeddings), 128K context window
- E4B: 4.5B effective parameters (8B with embeddings), 128K context window
- 26B A4B: 3.8B active parameters out of 25.2B total (MoE architecture), 256K context window
- 31B Dense: 30.7B parameters, 256K context window
The "E" designation indicates effective parameters achieved through Per-Layer Embeddings (PLE), while "A" denotes active parameters in the MoE variant. This architecture allows the 26B A4B to run nearly as fast as a 4B model during inference while maintaining frontier-level performance.
Multimodal and Reasoning Capabilities
All models handle text and image input with variable aspect ratio and resolution support. E2B and E4B add native audio support including automatic speech recognition (ASR) and speech-to-translated-text translation. All models include configurable thinking modes for step-by-step reasoning and support native function calling for agentic workflows.
The models support 140+ languages in pre-training with 35+ languages confirmed for downstream tasks.
Benchmark Performance
Gemma 4 31B achieved:
- MMLU Pro: 85.2%
- AIME 2026 (no tools): 89.2%
- LiveCodeBench v6: 80.0%
- Codeforces ELO: 2150
- GPQA Diamond: 84.3%
- Vision MMMU Pro: 76.9%
- Long Context (MRCR v2, 8 needle @ 128K): 66.4%
The 26B A4B MoE variant tracked closely behind: MMLU Pro 82.6%, AIME 2026 88.3%, LiveCodeBench 77.1%, Codeforces ELO 1718, and GPQA Diamond 82.3%.
Smaller models show proportional scaling: E4B achieved MMLU Pro 69.4% and GPQA Diamond 58.6%, while E2B reached 60.0% and 43.4% respectively.
Technical Architecture
Gemma 4 employs a hybrid attention mechanism combining local sliding window attention (512-1024 tokens depending on size) with global full attention in the final layer. This balances computational efficiency with long-context awareness. Global layers use unified Keys and Values with Proportional RoPE (p-RoPE) for memory optimization.
Vision encoders add ~150M parameters to E2B/E4B and ~550M to larger models. Audio encoders add ~300M parameters to E2B and E4B only.
Deployment and Availability
Models are available via Hugging Face with full Transformers library support. The smaller E2B and E4B models target mobile phones and laptops, while 26B A4B and 31B Dense scale to consumer GPUs, workstations, and servers. All models include native system prompt support for structured conversations.
What This Means
Gemma 4 significantly expands Google's open-weights presence across the model size spectrum. The efficient parameter design—particularly effective parameters in E2B/E4B and active parameters in 26B A4B—enables deployment scenarios previously requiring much larger models. The reasoning modes and multimodal capabilities position Gemma 4 for complex reasoning tasks and agent applications without proprietary API dependencies. Performance metrics indicate competitive scaling within size classes, though 31B-class models from other vendors maintain leads on reasoning benchmarks. The extended context window (256K on larger models) addresses enterprise document processing and long-context reasoning requirements.
Related Articles
Google DeepMind releases Gemma 4: multimodal models up to 31B parameters with 256K context
Google DeepMind released the Gemma 4 family of open-weights multimodal models in four sizes: E2B (2.3B effective), E4B (4.5B effective), 26B A4B (25.2B total, 3.8B active), and 31B dense. All models support text and image input with 128K-256K context windows, reasoning modes, and native function calling for agentic workflows.
Google releases Gemma 4 family with 31B model, 256K context, multimodal capabilities
Google DeepMind released the Gemma 4 family of open-weights models ranging from 2.3B to 31B parameters, featuring up to 256K token context windows and native support for text, image, video, and audio inputs. The flagship 31B model scores 85.2% on MMLU Pro and 89.2% on AIME 2026, with a smaller 26B MoE variant requiring only 3.8B active parameters for faster inference.
Google launches Gemma 4 open-weights models with Apache 2.0 license to compete with Chinese LLMs
Google released Gemma 4, a new line of open-weights models available in sizes from 2 billion to 31 billion parameters, under a permissive Apache 2.0 license. The release includes multimodal capabilities, support for 140+ languages, native function calling, and a 256,000-token context window for the larger variants.
Google DeepMind releases Gemma 4 open models with multimodal capabilities and 256K context window
Google DeepMind released the Gemma 4 family of open-source models with multimodal capabilities (text, image, audio, video) and context windows up to 256K tokens. Four distinct model sizes—E2B (2.3B effective parameters), E4B (4.5B effective), 26B A4B (3.8B active), and 31B—are available under the Apache 2.0 license, with instruction-tuned and pre-trained variants.
Comments
Loading...