Google DeepMind releases Gemma 4, open multimodal models with 256K context and reasoning

TL;DR

Google DeepMind has released Gemma 4, a family of open-weights multimodal models ranging from 2.3B to 31B parameters with support for text, images, video, and audio. The models feature context windows up to 256K tokens, built-in reasoning modes, and native function calling for agentic workflows.

April 3, 2026 · 4:05 AM3 min read

Gemma 4 31B Instruct — Quick Specs

Context window262K tokens

Compare Gemma 4 31B Instruct with other models →

Google DeepMind Releases Gemma 4: Open Multimodal Models with Extended Context

Google DeepMind has released Gemma 4, a family of open-weights models spanning from 2.3B to 31B parameters with multimodal capabilities and extended context windows up to 256K tokens. The release includes both dense and Mixture-of-Experts (MoE) architectures designed for deployment across devices from mobile phones to data center servers.

Model Sizes and Specifications

Gemma 4 offers four distinct variants:

E2B: 2.3B effective parameters (5.1B with embeddings), 128K context, text/image/audio support
E4B: 4.5B effective parameters (8B with embeddings), 128K context, text/image/audio support
26B A4B (MoE): 25.2B total parameters with 3.8B active parameters, 256K context, text/image support
31B Dense: 30.7B parameters, 256K context, text/image support

The smaller E2B and E4B models use Per-Layer Embeddings (PLE) technology to reduce effective parameter counts, enabling efficient deployment on edge devices. The 26B A4B variant uses a Mixture-of-Experts approach with 128 total experts and 8 active experts, claiming inference speeds comparable to a 4B model while maintaining 26B total capacity.

Capabilities and Architecture

All Gemma 4 models support text and image inputs with variable aspect ratios and resolutions. The E2B and E4B models additionally include native audio support with automatic speech recognition and multilingual speech-to-translation capabilities. Video understanding is available through frame sequence processing.

Key features include:

Reasoning: Configurable thinking modes enabling step-by-step reasoning before response generation
Function Calling: Native support for structured tool use and agentic workflows
Hybrid Attention: Combines local sliding window attention with full global attention, with Proportional RoPE optimization for memory efficiency
Multilingual: Pre-trained on 140+ languages with out-of-the-box support for 35+
Native System Prompt Support: Structured conversation control

Benchmark Performance

The instruction-tuned models show significant improvements in reasoning and coding tasks:

Gemma 4 31B achieves:

MMLU Pro: 85.2%
AIME 2026 (no tools): 89.2%
LiveCodeBench v6: 80.0%
Codeforces ELO: 2150
GPQA Diamond: 84.3%
MATH-Vision: 85.6%
Long Context MRCR v2 (128K needle): 66.4%

Gemma 4 26B A4B demonstrates strong performance-to-efficiency trade-offs:

MMLU Pro: 82.6%
AIME 2026 (no tools): 88.3%
LiveCodeBench v6: 77.1%
Codeforces ELO: 1718

Smaller models show corresponding improvements over Gemma 3 27B, with E2B scoring 60.0% on MMLU Pro compared to Gemma 3's 67.6% baseline.

Release Details

The models are released under Apache 2.0 licensing as both pre-trained and instruction-tuned variants. Unsloth has released GGUF quantized versions optimized for local inference. The models are available through Hugging Face with support for the latest Transformers library.

Google DeepMind emphasizes on-device deployment viability for the smaller models while positioning larger variants for consumer GPU and server deployment. The hybrid architecture and context window scaling address trade-offs between inference speed and reasoning depth for long-context tasks.

What this means

Gemma 4 represents a significant shift toward production-ready open models with genuine multimodal capabilities and reasoning support at multiple scale points. The MoE variant offers a novel efficiency approach for teams balancing model capacity with inference latency constraints. Notably absent from the release are specific pricing details for cloud inference—unlike proprietary alternatives—since these are open-weights models suitable for self-hosted deployment. The 256K context window and strong long-context benchmark performance position these models competitively for document analysis and extended reasoning tasks against closed commercial alternatives.

Source: huggingface.co ↗

gemma-4 google-deepmind open-source multimodal reasoning long-context mixture-of-experts on-device-ai

model releaseJune 27, 2026

DeepSeek Releases V4-Pro with 1.6T Parameters, 1M Token Context at 27% Inference Cost of V3

DeepSeek has released two Mixture-of-Experts models: V4-Pro with 1.6 trillion parameters (49B activated) and V4-Flash with 284B parameters (13B activated), both supporting 1 million token context windows. V4-Pro requires only 27% of inference FLOPs and 10% of KV cache compared to V3.2 at 1M token context, trained on over 32 trillion tokens.

model releaseJuly 4, 2026

Mistral releases Leanstral 1.5: 119B parameter open-source model for Lean 4 proof assistance

Mistral AI has released Leanstral 1.5, an open-source 119B parameter mixture-of-experts model designed specifically for Lean 4 proof assistance. The model features 128 experts with 4 active per token (6.5B activated parameters), a 256k token context window, and multimodal input capabilities.

model releaseJune 29, 2026

DeepSeek Releases V4 Models: 1M Context Window, 90% Less KV Cache Than V3

DeepSeek has released two new MoE models: DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated). Both models support a one million token context window and use a hybrid attention architecture that requires only 27% of single-token inference FLOPs and 10% of KV cache compared to DeepSeek-V3.2.

model releaseJuly 1, 2026

Portugal releases Amália, open-source 9B parameter AI model trained on European Portuguese

Portugal has released Amália, its first national AI model trained specifically for European Portuguese. Built on EuroLLM-9B with 9 billion parameters, the model is fully open-source with weights, datasets, and code published under an open license. The government has committed €5.5m in initial funding through 2027.