model releaseGoogle DeepMind

Google DeepMind releases Gemma 4: multimodal models up to 31B parameters with 256K context

TL;DR

Google DeepMind released the Gemma 4 family of open-weights multimodal models in four sizes: E2B (2.3B effective), E4B (4.5B effective), 26B A4B (25.2B total, 3.8B active), and 31B dense. All models support text and image input with 128K-256K context windows, reasoning modes, and native function calling for agentic workflows.

2 min read
0

Google DeepMind released Gemma 4, a family of open-weights multimodal models spanning four distinct sizes from 2.3B to 31B parameters, available under Apache 2.0 license on Hugging Face.

Model Specifications

The Gemma 4 lineup includes:

  • E2B: 2.3B effective parameters (5.1B with embeddings), 128K context window, supports text, image, and audio
  • E4B: 4.5B effective parameters (8B with embeddings), 128K context window, supports text, image, and audio
  • 26B A4B: 25.2B total parameters with only 3.8B active during inference, 256K context window, supports text and image
  • 31B: 30.7B parameters, 256K context window, supports text and image

The smaller models (E2B/E4B) use Per-Layer Embeddings (PLE) to reduce effective parameter counts while maintaining multilingual support across 140+ languages. The 26B A4B employs a Mixture-of-Experts architecture with 8 active experts selected from 128 total, enabling fast inference comparable to a 4B model despite 26B total parameters.

Key Capabilities

All Gemma 4 models feature:

  • Reasoning mode: Configurable thinking modes enabling step-by-step problem solving
  • Extended multimodalities: Text, images with variable aspect ratio/resolution support; video via frame sequences; audio (E2B/E4B only) for ASR and speech-to-translation
  • Function calling: Native structured tool use for autonomous agent workflows
  • Long context: 128K (E2B/E4B) or 256K (26B A4B/31B) token windows
  • Coding support: Code generation, completion, and correction with notable benchmark improvements
  • Native system prompts: Enhanced control over conversational behavior

The architecture employs hybrid attention mechanisms combining local sliding window attention (512-1024 tokens) with full global attention on final layers, optimized with Proportional RoPE (p-RoPE) for long-context memory efficiency.

Benchmark Performance

Instruction-tuned model evaluation shows:

31B Dense Model:

  • MMLU Pro: 85.2%
  • AIME 2026 (no tools): 89.2%
  • LiveCodeBench v6: 80.0%
  • Codeforces ELO: 2150
  • GPQA Diamond: 84.3%

26B A4B (MoE):

  • MMLU Pro: 82.6%
  • AIME 2026 (no tools): 88.3%
  • LiveCodeBench v6: 77.1%
  • Codeforces ELO: 1718
  • GPQA Diamond: 82.3%

E4B:

  • MMLU Pro: 69.4%
  • LiveCodeBench v6: 52.0%
  • Codeforces ELO: 940

Vision benchmarks show MMMU Pro scores of 76.9% (31B), 73.8% (26B A4B), and 52.6% (E4B). The 31B model achieved 66.4% on long-context needle-in-haystack evaluation at 128K tokens.

Deployment Flexibility

Google positions Gemma 4 for diverse deployment scenarios: E2B and E4B for mobile and edge devices; 26B A4B for consumer GPUs and workstations balancing speed and capability via MoE; 31B for high-end servers requiring maximum performance. All models are available in both pre-trained and instruction-tuned variants.

What This Means

Gemma 4 extends Google's open-model strategy to multimodal reasoning at multiple efficiency tiers. The 26B A4B model's sparse activation approach offers a compelling alternative to dense models—matching near-31B performance while running 6-7× faster. With 256K context windows and reasoning modes, Gemma 4 targets competitive positioning against closed models in long-context and agentic use cases, while maintaining deployment flexibility from phones to data centers. The Apache 2.0 license enables commercial use without restrictions.

Related Articles

model release

Google DeepMind releases Gemma 4 with 4 model sizes, 256K context, and multimodal reasoning

Google DeepMind released Gemma 4, a family of open-weights multimodal models in four sizes: E2B (2.3B effective), E4B (4.5B effective), 26B A4B (3.8B active), and 31B (30.7B parameters). All models support text and image input with 128K-256K context windows, while E2B and E4B add native audio capabilities and reasoning modes across 140+ languages.

model release

Google releases Gemma 4 31B with 256K context and configurable reasoning mode

Google DeepMind has released Gemma 4 31B, a 30.7-billion-parameter multimodal model supporting text and image input. The model features a 262,144-token context window, configurable thinking/reasoning mode, native function calling, and multilingual support across 140+ languages under Apache 2.0 license.

model release

Google releases Gemma 4 family with 31B model, 256K context, multimodal capabilities

Google DeepMind released the Gemma 4 family of open-weights models ranging from 2.3B to 31B parameters, featuring up to 256K token context windows and native support for text, image, video, and audio inputs. The flagship 31B model scores 85.2% on MMLU Pro and 89.2% on AIME 2026, with a smaller 26B MoE variant requiring only 3.8B active parameters for faster inference.

model release

Google DeepMind releases Gemma 4 open models with multimodal capabilities and 256K context window

Google DeepMind released the Gemma 4 family of open-source models with multimodal capabilities (text, image, audio, video) and context windows up to 256K tokens. Four distinct model sizes—E2B (2.3B effective parameters), E4B (4.5B effective), 26B A4B (3.8B active), and 31B—are available under the Apache 2.0 license, with instruction-tuned and pre-trained variants.

Comments

Loading...