model release

Google releases Gemma 4 family with 31B model, 256K context, multimodal capabilities

TL;DR

Google DeepMind released the Gemma 4 family of open-weights models ranging from 2.3B to 31B parameters, featuring up to 256K token context windows and native support for text, image, video, and audio inputs. The flagship 31B model scores 85.2% on MMLU Pro and 89.2% on AIME 2026, with a smaller 26B MoE variant requiring only 3.8B active parameters for faster inference.

2 min read
0

Google Releases Gemma 4 Family with Multimodal Capabilities and Up to 256K Context

Google DeepMind launched Gemma 4, a family of open-weights models ranging from 2.3B to 31B parameters, introducing multimodal capabilities including text, image, video, and audio processing alongside native reasoning modes.

Model Sizes and Architecture

The release includes four model variants:

  • E2B: 2.3B effective parameters (5.1B with embeddings), 128K context
  • E4B: 4.5B effective parameters (8B with embeddings), 128K context
  • 26B A4B: 25.2B total parameters with 3.8B active (MoE), 256K context
  • 31B: 30.7B parameters, 256K context

All models employ a hybrid attention mechanism combining local sliding window attention with full global attention. The architecture uses Per-Layer Embeddings (PLE) in smaller models to optimize on-device deployment, while the 26B variant uses Mixture-of-Experts with 8 active experts from 128 total.

Capabilities and Features

Gemma 4 models support:

  • Multimodal Input: Text, images with variable aspect ratios and resolutions (all models), video frame processing, and native audio for E2B/E4B
  • Reasoning Modes: Configurable thinking modes enabling step-by-step reasoning before generation
  • Extended Context: 128K tokens for E2B/E4B, 256K for larger models
  • Function Calling: Native structured tool use for agentic workflows
  • Multilingual Support: 140+ languages in pre-training, 35+ in production
  • Audio Processing: ASR and speech-to-translation on E2B and E4B only
  • System Prompt Support: Native support for system role in conversations

Benchmark Performance

The 31B model achieves:

  • MMLU Pro: 85.2%
  • AIME 2026 (no tools): 89.2%
  • LiveCodeBench v6: 80.0%
  • Codeforces ELO: 2150
  • GPQA Diamond: 84.3%
  • Vision MMMU Pro: 76.9%
  • MATH-Vision: 85.6%

The 26B A4B MoE variant scores 82.6% on MMLU Pro and 88.3% on AIME 2026 while requiring significantly less compute due to sparse activation. Smaller E4B and E2B models score 69.4% and 60.0% on MMLU Pro respectively, suitable for on-device deployment.

Deployment and Licensing

All models are available under Apache 2.0 license through Hugging Face. The diverse size range targets deployment scenarios from mobile and edge devices (E2B/E4B) to consumer GPUs, workstations, and servers (26B/31B). Models can be loaded using the latest version of Hugging Face Transformers library with single-line calls to AutoProcessor and AutoModelForCausalLM.

Google emphasizes efficient on-device execution for smaller variants, with E2B and E4B specifically optimized for laptops and phones. Vision encoder parameters total ~150M (E2B/E4B) and ~550M (larger models), while audio encoders add ~300M parameters to smaller variants.

What This Means

Gemma 4 represents Google's commitment to open-weights multimodal models across the size spectrum. The MoE variant offers a compelling middle ground—matching dense 31B reasoning performance at 4B-parameter inference speed. For on-device deployment, E2B/E4B with native audio support fill a gap between pure language models and larger multimodal systems. Benchmark improvements in coding (Codeforces ELO 2150 vs. Gemma 3's 110) and reasoning tasks position these as competitive with closed-source alternatives, though pricing and hardware requirements differ significantly from API-based competitors.

Related Articles

model release

Google DeepMind releases Gemma 4 open models with multimodal capabilities and 256K context window

Google DeepMind released the Gemma 4 family of open-source models with multimodal capabilities (text, image, audio, video) and context windows up to 256K tokens. Four distinct model sizes—E2B (2.3B effective parameters), E4B (4.5B effective), 26B A4B (3.8B active), and 31B—are available under the Apache 2.0 license, with instruction-tuned and pre-trained variants.

model release

Google DeepMind releases Gemma 4 with 4 model sizes, 256K context, and multimodal reasoning

Google DeepMind released Gemma 4, a family of open-weights multimodal models in four sizes: E2B (2.3B effective), E4B (4.5B effective), 26B A4B (3.8B active), and 31B (30.7B parameters). All models support text and image input with 128K-256K context windows, while E2B and E4B add native audio capabilities and reasoning modes across 140+ languages.

model release

Google DeepMind releases Gemma 4: multimodal models up to 31B parameters with 256K context

Google DeepMind released the Gemma 4 family of open-weights multimodal models in four sizes: E2B (2.3B effective), E4B (4.5B effective), 26B A4B (25.2B total, 3.8B active), and 31B dense. All models support text and image input with 128K-256K context windows, reasoning modes, and native function calling for agentic workflows.

model release

Google releases Gemma 4 31B with 256K context and configurable reasoning mode

Google DeepMind has released Gemma 4 31B, a 30.7-billion-parameter multimodal model supporting text and image input. The model features a 262,144-token context window, configurable thinking/reasoning mode, native function calling, and multilingual support across 140+ languages under Apache 2.0 license.

Comments

Loading...