model releaseGoogle DeepMind

Google DeepMind releases Gemma 4 with multimodal reasoning and up to 256K context window

TL;DR

Google DeepMind released Gemma 4, a multimodal model family supporting text, images, video, and audio with context windows up to 256K tokens. The release includes four sizes (E2B, E4B, 26B A4B, and 31B) designed for deployment from mobile devices to servers. The 31B dense model achieves 85.2% on MMLU Pro and 89.2% on AIME 2026.

3 min read
0

Google DeepMind Launches Gemma 4 with Multimodal Reasoning Capabilities

Google DeepMind released Gemma 4, a family of open-weight multimodal models supporting text, images, video, and audio inputs with reasoning modes and context windows up to 256K tokens.

Model Lineup and Architecture

Gemma 4 includes four distinct sizes:

Dense Models:

  • E2B: 2.3B effective parameters (5.1B with embeddings), 128K context window
  • E4B: 4.5B effective parameters (8B with embeddings), 128K context window
  • 31B: 30.7B parameters, 256K context window

Mixture-of-Experts:

  • 26B A4B: 25.2B total parameters, 3.8B active parameters, 256K context window, 8 active experts out of 128 total

The "E" designation indicates "effective" parameters achieved through Per-Layer Embeddings (PLE), where each decoder layer maintains its own small embedding table for quick lookups. The "A" in the A4B model denotes active parameters—only 3.8B of 25.2B total parameters activate during inference, enabling near-4B inference speed at 26B model scale.

All models employ hybrid attention mechanisms combining local sliding window attention (512-1024 tokens) with full global attention in the final layer. Global layers use unified Keys and Values with Proportional RoPE for memory optimization during long-context processing.

Multimodal and Reasoning Capabilities

Gemma 4 handles:

  • Text and Images: All models support variable aspect ratio and resolution image processing
  • Video: Frame sequence analysis available across the family
  • Audio: Native ASR and speech-to-translated-text on E2B and E4B models only
  • Reasoning: Built-in configurable thinking modes enabling step-by-step problem solving
  • Function Calling: Native structured tool use for agentic workflows
  • System Prompts: Native system role support for controlled conversations
  • Multilingual: Pre-trained on 140+ languages with native 35+ language support

Benchmark Performance

Instruction-tuned benchmark results:

Benchmark 31B 26B A4B E4B E2B
MMLU Pro 85.2% 82.6% 69.4% 60.0%
AIME 2026 (no tools) 89.2% 88.3% 42.5% 37.5%
LiveCodeBench v6 80.0% 77.1% 52.0% 44.0%
Codeforces ELO 2150 1718 940 633
GPQA Diamond 84.3% 82.3% 58.6% 43.4%
MMMLU (Multilingual) 88.4% 86.3% 76.6% 67.4%
Vision MMMU Pro 76.9% 73.8% 52.6% 44.2%
MATH-Vision 85.6% 82.4% 59.5% 52.4%
BigBench Extra Hard 74.4% 64.8% 33.1% 21.9%

For long-context evaluation (MRCR v2, 128K tokens with 8 needles), the 31B model achieved 66.4% average accuracy.

Deployment and Availability

Models are available under Apache 2.0 license with open weights. Unsloth offers optimized GGUF (4-bit) quantized versions enabling local execution on laptops and mobile devices. All models are available via Hugging Face Transformers library and compatible with Unsloth Studio for fine-tuning and inference.

The family is designed for diverse deployment scenarios: E2B and E4B for edge/mobile, 26B A4B for consumer GPUs, and 31B for workstations and servers.

What This Means

Gemma 4 represents a significant consolidation of multimodal capabilities in open models. The efficiency-focused variants (E2B, E4B, 26B A4B) expand deployment options beyond high-end data centers, while the 31B variant approaches frontier performance on reasoning and code benchmarks (85.2% MMLU Pro, 89.2% AIME). The native reasoning modes and function-calling address the growing demand for agentic workflows. However, the smaller models show notable performance drops on advanced reasoning tasks—the E4B drops to 69.4% MMLU Pro versus 31B's 85.2%, suggesting size-dependent trade-offs for edge deployments.

Related Articles

model release

Google DeepMind releases Gemma 4: multimodal models up to 31B parameters with 256K context

Google DeepMind released the Gemma 4 family of open-weights multimodal models in four sizes: E2B (2.3B effective), E4B (4.5B effective), 26B A4B (25.2B total, 3.8B active), and 31B dense. All models support text and image input with 128K-256K context windows, reasoning modes, and native function calling for agentic workflows.

model release

Google DeepMind releases Gemma 4, open multimodal models with 256K context and reasoning

Google DeepMind has released Gemma 4, a family of open-weights multimodal models ranging from 2.3B to 31B parameters with support for text, images, video, and audio. The models feature context windows up to 256K tokens, built-in reasoning modes, and native function calling for agentic workflows.

model release

Google DeepMind releases Gemma 4 open models with up to 256K context and multimodal reasoning

Google DeepMind has released Gemma 4, an open-weights model family in four sizes (2.3B to 31B parameters) with multimodal capabilities handling text, images, video, and audio. The 26B A4B variant uses mixture-of-experts to achieve 4B active parameters while supporting 256K token context windows and native reasoning modes.

model release

Google releases Gemma 4 31B with 256K context and configurable reasoning mode

Google DeepMind has released Gemma 4 31B, a 30.7-billion-parameter multimodal model supporting text and image input. The model features a 262,144-token context window, configurable thinking/reasoning mode, native function calling, and multilingual support across 140+ languages under Apache 2.0 license.

Comments

Loading...