model releaseGoogle DeepMind

Google DeepMind releases Gemma 4 open models with multimodal capabilities and 256K context window

TL;DR

Google DeepMind released the Gemma 4 family of open-source models with multimodal capabilities (text, image, audio, video) and context windows up to 256K tokens. Four distinct model sizes—E2B (2.3B effective parameters), E4B (4.5B effective), 26B A4B (3.8B active), and 31B—are available under the Apache 2.0 license, with instruction-tuned and pre-trained variants.

3 min read
0

Google DeepMind Releases Gemma 4: Open-Source Multimodal Models with Extended Context

Google DeepMind released the Gemma 4 family of open-source models today, introducing multimodal capabilities and significantly expanded context windows. The family includes four distinct model sizes, ranging from 2.3B to 31B parameters, all available under the Apache 2.0 license.

Model Specifications and Architectures

Gemma 4 employs both dense and Mixture-of-Experts (MoE) architectures:

Dense Models:

  • E2B: 2.3B effective parameters (5.1B with embeddings), 128K context window
  • E4B: 4.5B effective parameters (8B with embeddings), 128K context window
  • 31B: 30.7B parameters, 256K context window, 60 layers

MoE Model:

  • 26B A4B: 25.2B total parameters with 3.8B active parameters, 256K context window, 8 active experts from 128 total

The "E" in E2B/E4B denotes "effective parameters"—the models use Per-Layer Embeddings (PLE) to maximize efficiency on-device without increasing layer or parameter counts. The "A" in 26B A4B indicates active parameters, allowing this model to match inference speed of a 4B model while maintaining 26B total capacity.

Multimodal Capabilities and Modalities

All four models process text and images with variable aspect ratios and resolutions. E2B and E4B additionally support:

  • Audio: Native automatic speech recognition (ASR) and speech-to-translated-text across multiple languages
  • Video: Frame sequence processing for video understanding

All models support interleaved multimodal input, allowing text and images to be freely mixed within prompts.

Benchmark Performance

Gemma 4 shows substantial improvements over Gemma 3 27B (no thinking mode):

Benchmark Gemma 4 31B Gemma 4 26B A4B Gemma 4 E4B Gemma 3 27B
MMLU Pro 85.2% 82.6% 69.4% 67.6%
AIME 2026 89.2% 88.3% 42.5% 20.8%
LiveCodeBench v6 80.0% 77.1% 52.0% 29.1%
Codeforces ELO 2150 1718 940 110
GPQA Diamond 84.3% 82.3% 58.6% 42.4%
MMMLU 88.4% 86.3% 76.6% 70.7%
Vision MMMU Pro 76.9% 73.8% 52.6% 49.7%
MATH-Vision 85.6% 82.4% 59.5% 46.0%

The E4B model demonstrates the most significant coding improvements, with a Codeforces ELO of 940 compared to Gemma 3's 110, and LiveCodeBench performance of 52.0% versus 29.1%.

Core Capabilities

All models feature:

  • Reasoning/Thinking mode: Configurable step-by-step reasoning before generating answers
  • Function calling: Native support for structured tool use and agentic workflows
  • System prompt support: Native system role handling for structured conversations
  • Multilingual: Pre-trained on 140+ languages with 35+ language support
  • Code generation: Full code completion, generation, and correction capabilities

Architecture and Efficiency

All Gemma 4 models employ a hybrid attention mechanism that interleaves local sliding window attention (512-1024 tokens depending on model size) with full global attention. The final layer always uses global attention. For long-context optimization, global layers use unified Keys and Values with Proportional RoPE (p-RoPE).

Vision encoders are approximately 150M parameters for smaller models and 550M for larger models. E2B and E4B include 300M-parameter audio encoders.

Availability and Deployment

All Gemma 4 models are available on Hugging Face with integration into the latest Transformers library. The smaller E2B and E4B models target mobile and edge devices, while 26B A4B and 31B target consumer GPUs and workstations. The MoE architecture makes 26B A4B particularly suitable for fast inference compared to the dense 31B variant.

What This Means

Gemma 4 represents a significant shift toward efficient, capable open-source multimodal models. The per-layer embedding approach and MoE variants provide genuine deployment flexibility—the E4B model can run on laptops and modern phones while the 26B A4B delivers frontier performance at 4B-equivalent inference speed. The 89.2% AIME score on the 31B model and substantial coding improvements suggest these models compete meaningfully with closed-source offerings. Multilingual support (140+ languages) and native audio/video handling address practical deployment requirements that many open models still lack.

Related Articles

model release

Google releases Gemma 4 family with 31B model, 256K context, multimodal capabilities

Google DeepMind released the Gemma 4 family of open-weights models ranging from 2.3B to 31B parameters, featuring up to 256K token context windows and native support for text, image, video, and audio inputs. The flagship 31B model scores 85.2% on MMLU Pro and 89.2% on AIME 2026, with a smaller 26B MoE variant requiring only 3.8B active parameters for faster inference.

model release

Google DeepMind releases Gemma 4 with 4 model sizes, 256K context, and multimodal reasoning

Google DeepMind released Gemma 4, a family of open-weights multimodal models in four sizes: E2B (2.3B effective), E4B (4.5B effective), 26B A4B (3.8B active), and 31B (30.7B parameters). All models support text and image input with 128K-256K context windows, while E2B and E4B add native audio capabilities and reasoning modes across 140+ languages.

model release

Google DeepMind releases Gemma 4: multimodal models up to 31B parameters with 256K context

Google DeepMind released the Gemma 4 family of open-weights multimodal models in four sizes: E2B (2.3B effective), E4B (4.5B effective), 26B A4B (25.2B total, 3.8B active), and 31B dense. All models support text and image input with 128K-256K context windows, reasoning modes, and native function calling for agentic workflows.

model release

Google releases Gemma 4 family under Apache 2.0 license with 2B to 31B models

Google has released Gemma 4, a family of four open models ranging from 2B to 31B parameters, now available under the Apache 2.0 license for the first time. The 31B dense model ranks 3rd on the Arena AI Text Leaderboard, while the 26B mixture-of-experts variant ranks 6th, both outperforming significantly larger competitors. All models support multimodal inputs and are available on Hugging Face, Kaggle, and Ollama.

Comments

Loading...