model releaseGoogle DeepMind

Google DeepMind releases Gemma 4 family with 256K context window and multimodal capabilities

TL;DR

Google DeepMind released the Gemma 4 family of open-weights models in four sizes (2.3B to 31B parameters) with multimodal support for text, images, video, and audio. The flagship 31B model achieves 85.2% on MMLU Pro and 89.2% on AIME 2024, with context windows up to 256K tokens. All models feature configurable reasoning modes and are optimized for deployment from mobile devices to servers under Apache 2.0 license.

3 min read
0

Google DeepMind Releases Gemma 4 Family: Four Models from 2.3B to 31B Parameters

Google DeepMind released the complete Gemma 4 model family today, spanning four distinct sizes optimized for deployment scenarios from edge devices to enterprise servers. All models are open-weights under Apache 2.0 license.

Model Lineup and Specifications

The family includes two dense models and one mixture-of-experts variant:

Dense Models:

  • E2B: 2.3B effective parameters (5.1B with embeddings), 128K context window
  • E4B: 4.5B effective parameters (8B with embeddings), 128K context window
  • 31B: 30.7B parameters, 256K context window

Mixture-of-Experts:

  • 26B A4B: 25.2B total parameters, 3.8B active parameters, 256K context window, 8 active experts from 128 total

The smaller models use Per-Layer Embeddings (PLE) technology to achieve parameter efficiency without sacrificing capabilities. The 26B A4B model activates only 4B parameters during inference, enabling performance comparable to a 4B model with the reasoning capacity of a 26B model.

Multimodal and Reasoning Capabilities

All Gemma 4 models handle text and image input. The E2B and E4B models additionally support audio input natively. All models feature configurable thinking/reasoning modes enabling step-by-step problem solving before generating responses.

Key capabilities include: function calling for agentic workflows, variable aspect ratio and resolution image processing, video frame analysis, multilingual support (140+ languages pre-trained, 35+ supported), and native system prompt support.

Smaller models employ a hybrid attention mechanism combining local sliding window attention (512 tokens for E-series, 1024 for larger models) with full global attention in final layers to balance memory efficiency with long-context awareness.

Benchmark Performance

Benchmark results are from instruction-tuned variants:

31B Model (Dense):

  • MMLU Pro: 85.2%
  • AIME 2024: 89.2% (no tools)
  • Codeforces ELO: 2150
  • LiveCodeBench v6: 80.0%
  • GPQA Diamond: 84.3%
  • MMMLU (multimodal): 88.4%
  • Vision MMMU Pro: 76.9%
  • MATH-Vision: 85.6%

26B A4B Model (MoE):

  • MMLU Pro: 82.6%
  • AIME 2024: 88.3%
  • Codeforces ELO: 1718
  • LiveCodeBench v6: 77.1%

E4B Model:

  • MMLU Pro: 69.4%
  • AIME 2024: 42.5%
  • Codeforces ELO: 940
  • Audio CoVoST: 35.54

All models demonstrate substantial improvements in coding benchmarks and long-context reasoning compared to Gemma 3 27B baseline. The long-context test (MRCR v2 with 128K context, 8-needle) shows the 31B achieving 66.4% versus 13.5% for Gemma 3 27B.

Deployment and Availability

Models are available via Hugging Face with full Transformers library support. The architecture choices enable diverse deployment: E2B and E4B target mobile and lightweight laptop execution, 26B A4B balances speed and capability for consumer GPUs, and 31B targets workstations and servers.

Google provided code examples for loading models, processing multi-turn conversations, enabling reasoning modes, and handling audio/video/image inputs alongside text.

What This Means

Gemma 4 represents a systematic expansion of open-weights model options across size classes. The emphasis on on-device efficiency (E-series with PLE) paired with frontier reasoning performance (31B, 26B A4B) creates genuine tradeoff options. The multimodal capabilities with configurable reasoning modes and extended context windows position these models for both traditional deployment and emerging agentic applications. Pricing for commercial deployment and specific inference cost metrics remain undisclosed.

Related Articles

model release

Google DeepMind releases Gemma 4: open models ranking #3 and #6 on Arena AI leaderboard

Google DeepMind released Gemma 4, a family of four open models ranging from 2B to 31B parameters, all licensed under Apache 2.0. The 31B dense model ranks #3 on Arena AI's text leaderboard and the 26B mixture-of-experts variant ranks #6, outperforming closed models significantly larger in size.

model release

Google DeepMind releases Gemma 4 with 4 model sizes, 256K context, and multimodal reasoning

Google DeepMind released Gemma 4, a family of open-weights multimodal models in four sizes: E2B (2.3B effective), E4B (4.5B effective), 26B A4B (3.8B active), and 31B (30.7B parameters). All models support text and image input with 128K-256K context windows, while E2B and E4B add native audio capabilities and reasoning modes across 140+ languages.

model release

Google DeepMind releases Gemma 4: multimodal models up to 31B parameters with 256K context

Google DeepMind released the Gemma 4 family of open-weights multimodal models in four sizes: E2B (2.3B effective), E4B (4.5B effective), 26B A4B (25.2B total, 3.8B active), and 31B dense. All models support text and image input with 128K-256K context windows, reasoning modes, and native function calling for agentic workflows.

model release

Google launches Gemma 4 open-weights models with Apache 2.0 license to compete with Chinese LLMs

Google released Gemma 4, a new line of open-weights models available in sizes from 2 billion to 31 billion parameters, under a permissive Apache 2.0 license. The release includes multimodal capabilities, support for 140+ languages, native function calling, and a 256,000-token context window for the larger variants.

Comments

Loading...