model releaseGoogle DeepMind

Google DeepMind releases Gemma 4 family with 256K context window and multimodal capabilities

TL;DR

Google DeepMind released the Gemma 4 family of open-weights models in four sizes (2.3B to 31B parameters) with multimodal support for text, images, video, and audio. The flagship 31B model achieves 85.2% on MMLU Pro and 89.2% on AIME 2024, with context windows up to 256K tokens. All models feature configurable reasoning modes and are optimized for deployment from mobile devices to servers under Apache 2.0 license.

3 min read
0

Google DeepMind Releases Gemma 4 Family: Four Models from 2.3B to 31B Parameters

Google DeepMind released the complete Gemma 4 model family today, spanning four distinct sizes optimized for deployment scenarios from edge devices to enterprise servers. All models are open-weights under Apache 2.0 license.

Model Lineup and Specifications

The family includes two dense models and one mixture-of-experts variant:

Dense Models:

  • E2B: 2.3B effective parameters (5.1B with embeddings), 128K context window
  • E4B: 4.5B effective parameters (8B with embeddings), 128K context window
  • 31B: 30.7B parameters, 256K context window

Mixture-of-Experts:

  • 26B A4B: 25.2B total parameters, 3.8B active parameters, 256K context window, 8 active experts from 128 total

The smaller models use Per-Layer Embeddings (PLE) technology to achieve parameter efficiency without sacrificing capabilities. The 26B A4B model activates only 4B parameters during inference, enabling performance comparable to a 4B model with the reasoning capacity of a 26B model.

Multimodal and Reasoning Capabilities

All Gemma 4 models handle text and image input. The E2B and E4B models additionally support audio input natively. All models feature configurable thinking/reasoning modes enabling step-by-step problem solving before generating responses.

Key capabilities include: function calling for agentic workflows, variable aspect ratio and resolution image processing, video frame analysis, multilingual support (140+ languages pre-trained, 35+ supported), and native system prompt support.

Smaller models employ a hybrid attention mechanism combining local sliding window attention (512 tokens for E-series, 1024 for larger models) with full global attention in final layers to balance memory efficiency with long-context awareness.

Benchmark Performance

Benchmark results are from instruction-tuned variants:

31B Model (Dense):

  • MMLU Pro: 85.2%
  • AIME 2024: 89.2% (no tools)
  • Codeforces ELO: 2150
  • LiveCodeBench v6: 80.0%
  • GPQA Diamond: 84.3%
  • MMMLU (multimodal): 88.4%
  • Vision MMMU Pro: 76.9%
  • MATH-Vision: 85.6%

26B A4B Model (MoE):

  • MMLU Pro: 82.6%
  • AIME 2024: 88.3%
  • Codeforces ELO: 1718
  • LiveCodeBench v6: 77.1%

E4B Model:

  • MMLU Pro: 69.4%
  • AIME 2024: 42.5%
  • Codeforces ELO: 940
  • Audio CoVoST: 35.54

All models demonstrate substantial improvements in coding benchmarks and long-context reasoning compared to Gemma 3 27B baseline. The long-context test (MRCR v2 with 128K context, 8-needle) shows the 31B achieving 66.4% versus 13.5% for Gemma 3 27B.

Deployment and Availability

Models are available via Hugging Face with full Transformers library support. The architecture choices enable diverse deployment: E2B and E4B target mobile and lightweight laptop execution, 26B A4B balances speed and capability for consumer GPUs, and 31B targets workstations and servers.

Google provided code examples for loading models, processing multi-turn conversations, enabling reasoning modes, and handling audio/video/image inputs alongside text.

What This Means

Gemma 4 represents a systematic expansion of open-weights model options across size classes. The emphasis on on-device efficiency (E-series with PLE) paired with frontier reasoning performance (31B, 26B A4B) creates genuine tradeoff options. The multimodal capabilities with configurable reasoning modes and extended context windows position these models for both traditional deployment and emerging agentic applications. Pricing for commercial deployment and specific inference cost metrics remain undisclosed.

Related Articles

model release

DeepSeek Releases V4 Flash: 284B-Parameter MoE Model with 1M Context Window, Free via OpenRouter

DeepSeek has released V4 Flash, a Mixture-of-Experts model with 284B total parameters and 13B activated parameters per forward pass. The model supports a 1M-token context window and is available free through OpenRouter, targeting high-throughput coding and chat applications.

model release

Google releases Gemini Omni Flash video generation model with conversational editing, withholds speech synthesis

Google DeepMind released Gemini Omni Flash, the first model in its new Omni family that generates and edits video from image, audio, video, and text inputs. The model is rolling out to Gemini app subscribers and YouTube Shorts with a 10-second clip limit, while speech-editing capabilities remain withheld pending safety testing.

product update

Google DeepMind connects Genie world model to 280 billion Street View images, Waymo already using for self-driving train

Google DeepMind has integrated its Genie world model with Street View's 280 billion images spanning 110 countries, enabling users to explore AI-generated simulations of real locations. Waymo is already using Genie 3 to train self-driving cars on rare scenarios like tornadoes and unexpected obstacles.

model release

Google releases Gemini 3.5 Flash with 4x faster output and agentic capabilities, 3.5 Pro coming June

Google released Gemini 3.5 Flash today with 4x faster output token generation than competing frontier models while surpassing Gemini 3.1 Pro on coding, agentic, and multimodal benchmarks. The company announced Gemini 3.5 Pro will launch next month and introduced Gemini Omni, a new multimodal series that outputs video.

Comments

Loading...