Google DeepMind releases Gemma 4 with multimodal reasoning and up to 256K context window
Google DeepMind released Gemma 4, a multimodal model family supporting text, images, video, and audio with context windows up to 256K tokens. The release includes four sizes (E2B, E4B, 26B A4B, and 31B) designed for deployment from mobile devices to servers. The 31B dense model achieves 85.2% on MMLU Pro and 89.2% on AIME 2026.
Gemma 4 E4B Instruction-Tuned — Quick Specs
Google DeepMind Launches Gemma 4 with Multimodal Reasoning Capabilities
Google DeepMind released Gemma 4, a family of open-weight multimodal models supporting text, images, video, and audio inputs with reasoning modes and context windows up to 256K tokens.
Model Lineup and Architecture
Gemma 4 includes four distinct sizes:
Dense Models:
- E2B: 2.3B effective parameters (5.1B with embeddings), 128K context window
- E4B: 4.5B effective parameters (8B with embeddings), 128K context window
- 31B: 30.7B parameters, 256K context window
Mixture-of-Experts:
- 26B A4B: 25.2B total parameters, 3.8B active parameters, 256K context window, 8 active experts out of 128 total
The "E" designation indicates "effective" parameters achieved through Per-Layer Embeddings (PLE), where each decoder layer maintains its own small embedding table for quick lookups. The "A" in the A4B model denotes active parameters—only 3.8B of 25.2B total parameters activate during inference, enabling near-4B inference speed at 26B model scale.
All models employ hybrid attention mechanisms combining local sliding window attention (512-1024 tokens) with full global attention in the final layer. Global layers use unified Keys and Values with Proportional RoPE for memory optimization during long-context processing.
Multimodal and Reasoning Capabilities
Gemma 4 handles:
- Text and Images: All models support variable aspect ratio and resolution image processing
- Video: Frame sequence analysis available across the family
- Audio: Native ASR and speech-to-translated-text on E2B and E4B models only
- Reasoning: Built-in configurable thinking modes enabling step-by-step problem solving
- Function Calling: Native structured tool use for agentic workflows
- System Prompts: Native system role support for controlled conversations
- Multilingual: Pre-trained on 140+ languages with native 35+ language support
Benchmark Performance
Instruction-tuned benchmark results:
| Benchmark | 31B | 26B A4B | E4B | E2B |
|---|---|---|---|---|
| MMLU Pro | 85.2% | 82.6% | 69.4% | 60.0% |
| AIME 2026 (no tools) | 89.2% | 88.3% | 42.5% | 37.5% |
| LiveCodeBench v6 | 80.0% | 77.1% | 52.0% | 44.0% |
| Codeforces ELO | 2150 | 1718 | 940 | 633 |
| GPQA Diamond | 84.3% | 82.3% | 58.6% | 43.4% |
| MMMLU (Multilingual) | 88.4% | 86.3% | 76.6% | 67.4% |
| Vision MMMU Pro | 76.9% | 73.8% | 52.6% | 44.2% |
| MATH-Vision | 85.6% | 82.4% | 59.5% | 52.4% |
| BigBench Extra Hard | 74.4% | 64.8% | 33.1% | 21.9% |
For long-context evaluation (MRCR v2, 128K tokens with 8 needles), the 31B model achieved 66.4% average accuracy.
Deployment and Availability
Models are available under Apache 2.0 license with open weights. Unsloth offers optimized GGUF (4-bit) quantized versions enabling local execution on laptops and mobile devices. All models are available via Hugging Face Transformers library and compatible with Unsloth Studio for fine-tuning and inference.
The family is designed for diverse deployment scenarios: E2B and E4B for edge/mobile, 26B A4B for consumer GPUs, and 31B for workstations and servers.
What This Means
Gemma 4 represents a significant consolidation of multimodal capabilities in open models. The efficiency-focused variants (E2B, E4B, 26B A4B) expand deployment options beyond high-end data centers, while the 31B variant approaches frontier performance on reasoning and code benchmarks (85.2% MMLU Pro, 89.2% AIME). The native reasoning modes and function-calling address the growing demand for agentic workflows. However, the smaller models show notable performance drops on advanced reasoning tasks—the E4B drops to 69.4% MMLU Pro versus 31B's 85.2%, suggesting size-dependent trade-offs for edge deployments.
Related Articles
Google DeepMind releases Gemma 4: multimodal models up to 31B parameters with 256K context
Google DeepMind released the Gemma 4 family of open-weights multimodal models in four sizes: E2B (2.3B effective), E4B (4.5B effective), 26B A4B (25.2B total, 3.8B active), and 31B dense. All models support text and image input with 128K-256K context windows, reasoning modes, and native function calling for agentic workflows.
Google DeepMind releases Gemma 4, open multimodal models with 256K context and reasoning
Google DeepMind has released Gemma 4, a family of open-weights multimodal models ranging from 2.3B to 31B parameters with support for text, images, video, and audio. The models feature context windows up to 256K tokens, built-in reasoning modes, and native function calling for agentic workflows.
Google DeepMind releases Gemma 4 open models with up to 256K context and multimodal reasoning
Google DeepMind has released Gemma 4, an open-weights model family in four sizes (2.3B to 31B parameters) with multimodal capabilities handling text, images, video, and audio. The 26B A4B variant uses mixture-of-experts to achieve 4B active parameters while supporting 256K token context windows and native reasoning modes.
Google releases Gemma 4 31B with 256K context and configurable reasoning mode
Google DeepMind has released Gemma 4 31B, a 30.7-billion-parameter multimodal model supporting text and image input. The model features a 262,144-token context window, configurable thinking/reasoning mode, native function calling, and multilingual support across 140+ languages under Apache 2.0 license.
Comments
Loading...