model releaseGoogle DeepMind

Google DeepMind releases Gemma 4, open multimodal models with 256K context and reasoning

TL;DR

Google DeepMind has released Gemma 4, a family of open-weights multimodal models ranging from 2.3B to 31B parameters with support for text, images, video, and audio. The models feature context windows up to 256K tokens, built-in reasoning modes, and native function calling for agentic workflows.

3 min read
0

Google DeepMind Releases Gemma 4: Open Multimodal Models with Extended Context

Google DeepMind has released Gemma 4, a family of open-weights models spanning from 2.3B to 31B parameters with multimodal capabilities and extended context windows up to 256K tokens. The release includes both dense and Mixture-of-Experts (MoE) architectures designed for deployment across devices from mobile phones to data center servers.

Model Sizes and Specifications

Gemma 4 offers four distinct variants:

  • E2B: 2.3B effective parameters (5.1B with embeddings), 128K context, text/image/audio support
  • E4B: 4.5B effective parameters (8B with embeddings), 128K context, text/image/audio support
  • 26B A4B (MoE): 25.2B total parameters with 3.8B active parameters, 256K context, text/image support
  • 31B Dense: 30.7B parameters, 256K context, text/image support

The smaller E2B and E4B models use Per-Layer Embeddings (PLE) technology to reduce effective parameter counts, enabling efficient deployment on edge devices. The 26B A4B variant uses a Mixture-of-Experts approach with 128 total experts and 8 active experts, claiming inference speeds comparable to a 4B model while maintaining 26B total capacity.

Capabilities and Architecture

All Gemma 4 models support text and image inputs with variable aspect ratios and resolutions. The E2B and E4B models additionally include native audio support with automatic speech recognition and multilingual speech-to-translation capabilities. Video understanding is available through frame sequence processing.

Key features include:

  • Reasoning: Configurable thinking modes enabling step-by-step reasoning before response generation
  • Function Calling: Native support for structured tool use and agentic workflows
  • Hybrid Attention: Combines local sliding window attention with full global attention, with Proportional RoPE optimization for memory efficiency
  • Multilingual: Pre-trained on 140+ languages with out-of-the-box support for 35+
  • Native System Prompt Support: Structured conversation control

Benchmark Performance

The instruction-tuned models show significant improvements in reasoning and coding tasks:

Gemma 4 31B achieves:

  • MMLU Pro: 85.2%
  • AIME 2026 (no tools): 89.2%
  • LiveCodeBench v6: 80.0%
  • Codeforces ELO: 2150
  • GPQA Diamond: 84.3%
  • MATH-Vision: 85.6%
  • Long Context MRCR v2 (128K needle): 66.4%

Gemma 4 26B A4B demonstrates strong performance-to-efficiency trade-offs:

  • MMLU Pro: 82.6%
  • AIME 2026 (no tools): 88.3%
  • LiveCodeBench v6: 77.1%
  • Codeforces ELO: 1718

Smaller models show corresponding improvements over Gemma 3 27B, with E2B scoring 60.0% on MMLU Pro compared to Gemma 3's 67.6% baseline.

Release Details

The models are released under Apache 2.0 licensing as both pre-trained and instruction-tuned variants. Unsloth has released GGUF quantized versions optimized for local inference. The models are available through Hugging Face with support for the latest Transformers library.

Google DeepMind emphasizes on-device deployment viability for the smaller models while positioning larger variants for consumer GPU and server deployment. The hybrid architecture and context window scaling address trade-offs between inference speed and reasoning depth for long-context tasks.

What this means

Gemma 4 represents a significant shift toward production-ready open models with genuine multimodal capabilities and reasoning support at multiple scale points. The MoE variant offers a novel efficiency approach for teams balancing model capacity with inference latency constraints. Notably absent from the release are specific pricing details for cloud inference—unlike proprietary alternatives—since these are open-weights models suitable for self-hosted deployment. The 256K context window and strong long-context benchmark performance position these models competitively for document analysis and extended reasoning tasks against closed commercial alternatives.

Related Articles

model release

Google DeepMind releases Gemma 4 open models with up to 256K context and multimodal reasoning

Google DeepMind has released Gemma 4, an open-weights model family in four sizes (2.3B to 31B parameters) with multimodal capabilities handling text, images, video, and audio. The 26B A4B variant uses mixture-of-experts to achieve 4B active parameters while supporting 256K token context windows and native reasoning modes.

model release

Google DeepMind releases Gemma 4 with 4 model sizes, 256K context, and multimodal reasoning

Google DeepMind released Gemma 4, a family of open-weights multimodal models in four sizes: E2B (2.3B effective), E4B (4.5B effective), 26B A4B (3.8B active), and 31B (30.7B parameters). All models support text and image input with 128K-256K context windows, while E2B and E4B add native audio capabilities and reasoning modes across 140+ languages.

model release

Google DeepMind releases Gemma 4 with four models up to 31B parameters, 256K context window

Google DeepMind released Gemma 4, an open-weights multimodal model family in four sizes (E2B, E4B, 26B A4B, 31B) with context windows up to 256K tokens and native reasoning capabilities. The 26B A4B variant uses Mixture-of-Experts architecture with 3.8B active parameters for efficient inference. All models support text, image input and handle 140+ languages with Apache 2.0 licensing.

model release

Google DeepMind releases Gemma 4 family with 256K context window and multimodal capabilities

Google DeepMind released the Gemma 4 family of open-weights models in four sizes (2.3B to 31B parameters) with multimodal support for text, images, video, and audio. The flagship 31B model achieves 85.2% on MMLU Pro and 89.2% on AIME 2024, with context windows up to 256K tokens. All models feature configurable reasoning modes and are optimized for deployment from mobile devices to servers under Apache 2.0 license.

Comments

Loading...