model releaseGoogle DeepMind

Google DeepMind releases Gemma 4 with four models up to 31B parameters, 256K context window

TL;DR

Google DeepMind released Gemma 4, an open-weights multimodal model family in four sizes (E2B, E4B, 26B A4B, 31B) with context windows up to 256K tokens and native reasoning capabilities. The 26B A4B variant uses Mixture-of-Experts architecture with 3.8B active parameters for efficient inference. All models support text, image input and handle 140+ languages with Apache 2.0 licensing.

2 min read
0

Google DeepMind Releases Gemma 4: Four Open-Weights Models with Multimodal Capabilities and Extended Context

Google DeepMind released Gemma 4, an open-weights model family available in four sizes designed for deployment across mobile devices to servers. The largest variant, the 31B Dense model, features 30.7B parameters with a 256K token context window. A Mixture-of-Experts variant, the 26B A4B, uses only 3.8B active parameters during inference while maintaining 25.2B total parameters—enabling performance approaching the 31B model with computational efficiency closer to a 4B model.

Model Specifications and Architecture

Gemma 4 includes two smaller models optimized for edge deployment: the E2B (2.3B effective parameters) and E4B (4.5B effective parameters), both featuring 128K context windows. The E-series models incorporate Per-Layer Embeddings (PLE) technology, where each decoder layer maintains its own token embedding table for memory efficiency during on-device inference.

All models employ hybrid attention mechanisms combining local sliding-window attention (512 tokens for E-series, 1024 for larger models) with full global attention in final layers. Proportional RoPE (p-RoPE) optimization reduces memory footprint for extended contexts.

Multimodal Capabilities

Gemma 4 handles text and image input with variable aspect ratio and resolution support across all four models. The E2B and E4B models additionally support audio, enabling automatic speech recognition and speech-to-translated-text translation across multiple languages. Video understanding is available through frame-sequence processing. All models feature native system prompt support and function-calling capabilities for structured tool use in agentic workflows.

Benchmark Performance

On MMLU Pro, the 31B Dense model scores 85.2%, compared to 82.6% for the 26B A4B variant. On coding tasks (LiveCodeBench v6), the 31B achieves 80.0% versus 77.1% for 26B A4B. For long-context retrieval (MRCR v2 8-needle at 128K), the 31B reaches 66.4% accuracy.

The smaller E4B model scores 69.4% on MMLU Pro and 52.0% on LiveCodeBench v6. Vision capabilities show the 31B at 76.9% on MMMU Pro and the E2B at 44.2%.

All instruction-tuned variants feature configurable thinking modes for step-by-step reasoning. The 31B model achieves 89.2% on AIME 2026 without tools, substantially above Gemma 3 27B (20.8%).

Multilingual and Deployment

Gemma 4 models maintain multilingual support across 140+ languages with dedicated pre-training for 35+ languages. Models are available under Apache 2.0 licensing, compatible with latest Transformers library versions.

Pricing and API availability were not disclosed in the announcement. Models are immediately accessible via Hugging Face and GitHub for local deployment.

What This Means

Gemma 4 significantly advances Google's open-model strategy by delivering multimodal capabilities competitive with frontier models at multiple size tiers. The 26B A4B's MoE architecture is particularly notable—achieving 82.6% MMLU Pro with only 3.8B active parameters challenges the assumption that parameter count directly determines inference cost. For developers, the range from E2B to 31B provides genuine deployment flexibility from mobile devices to servers. The 256K context and native reasoning support address two key limitations in previous Gemma releases, though benchmark improvements over Gemma 3 are modest on most tasks except long-context retrieval and coding.

Related Articles

model release

Google DeepMind releases Gemma 4 open models with up to 256K context and multimodal reasoning

Google DeepMind has released Gemma 4, an open-weights model family in four sizes (2.3B to 31B parameters) with multimodal capabilities handling text, images, video, and audio. The 26B A4B variant uses mixture-of-experts to achieve 4B active parameters while supporting 256K token context windows and native reasoning modes.

model release

Google DeepMind releases Gemma 4 with 4 model sizes, 256K context, and multimodal reasoning

Google DeepMind released Gemma 4, a family of open-weights multimodal models in four sizes: E2B (2.3B effective), E4B (4.5B effective), 26B A4B (3.8B active), and 31B (30.7B parameters). All models support text and image input with 128K-256K context windows, while E2B and E4B add native audio capabilities and reasoning modes across 140+ languages.

model release

Google DeepMind releases Gemma 4: multimodal models up to 31B parameters with 256K context

Google DeepMind released the Gemma 4 family of open-weights multimodal models in four sizes: E2B (2.3B effective), E4B (4.5B effective), 26B A4B (25.2B total, 3.8B active), and 31B dense. All models support text and image input with 128K-256K context windows, reasoning modes, and native function calling for agentic workflows.

model release

Google releases Gemma 4 31B with 256K context and configurable reasoning mode

Google DeepMind has released Gemma 4 31B, a 30.7-billion-parameter multimodal model supporting text and image input. The model features a 262,144-token context window, configurable thinking/reasoning mode, native function calling, and multilingual support across 140+ languages under Apache 2.0 license.

Comments

Loading...