Google DeepMind releases Gemma 4 with four models up to 31B parameters, 256K context window

TL;DR

Google DeepMind released Gemma 4, an open-weights multimodal model family in four sizes (E2B, E4B, 26B A4B, 31B) with context windows up to 256K tokens and native reasoning capabilities. The 26B A4B variant uses Mixture-of-Experts architecture with 3.8B active parameters for efficient inference. All models support text, image input and handle 140+ languages with Apache 2.0 licensing.

April 3, 2026 · 6:35 AM2 min read

Gemma 4 26B A4B — Quick Specs

Context window262K tokens

Compare Gemma 4 26B A4B with other models →

Google DeepMind Releases Gemma 4: Four Open-Weights Models with Multimodal Capabilities and Extended Context

Google DeepMind released Gemma 4, an open-weights model family available in four sizes designed for deployment across mobile devices to servers. The largest variant, the 31B Dense model, features 30.7B parameters with a 256K token context window. A Mixture-of-Experts variant, the 26B A4B, uses only 3.8B active parameters during inference while maintaining 25.2B total parameters—enabling performance approaching the 31B model with computational efficiency closer to a 4B model.

Model Specifications and Architecture

Gemma 4 includes two smaller models optimized for edge deployment: the E2B (2.3B effective parameters) and E4B (4.5B effective parameters), both featuring 128K context windows. The E-series models incorporate Per-Layer Embeddings (PLE) technology, where each decoder layer maintains its own token embedding table for memory efficiency during on-device inference.

All models employ hybrid attention mechanisms combining local sliding-window attention (512 tokens for E-series, 1024 for larger models) with full global attention in final layers. Proportional RoPE (p-RoPE) optimization reduces memory footprint for extended contexts.

Multimodal Capabilities

Gemma 4 handles text and image input with variable aspect ratio and resolution support across all four models. The E2B and E4B models additionally support audio, enabling automatic speech recognition and speech-to-translated-text translation across multiple languages. Video understanding is available through frame-sequence processing. All models feature native system prompt support and function-calling capabilities for structured tool use in agentic workflows.

Benchmark Performance

On MMLU Pro, the 31B Dense model scores 85.2%, compared to 82.6% for the 26B A4B variant. On coding tasks (LiveCodeBench v6), the 31B achieves 80.0% versus 77.1% for 26B A4B. For long-context retrieval (MRCR v2 8-needle at 128K), the 31B reaches 66.4% accuracy.

The smaller E4B model scores 69.4% on MMLU Pro and 52.0% on LiveCodeBench v6. Vision capabilities show the 31B at 76.9% on MMMU Pro and the E2B at 44.2%.

All instruction-tuned variants feature configurable thinking modes for step-by-step reasoning. The 31B model achieves 89.2% on AIME 2026 without tools, substantially above Gemma 3 27B (20.8%).

Multilingual and Deployment

Gemma 4 models maintain multilingual support across 140+ languages with dedicated pre-training for 35+ languages. Models are available under Apache 2.0 licensing, compatible with latest Transformers library versions.

Pricing and API availability were not disclosed in the announcement. Models are immediately accessible via Hugging Face and GitHub for local deployment.

What This Means

Gemma 4 significantly advances Google's open-model strategy by delivering multimodal capabilities competitive with frontier models at multiple size tiers. The 26B A4B's MoE architecture is particularly notable—achieving 82.6% MMLU Pro with only 3.8B active parameters challenges the assumption that parameter count directly determines inference cost. For developers, the range from E2B to 31B provides genuine deployment flexibility from mobile devices to servers. The 256K context and native reasoning support address two key limitations in previous Gemma releases, though benchmark improvements over Gemma 3 are modest on most tasks except long-context retrieval and coding.

Source: huggingface.co ↗

google-deepmind gemma open-weights multimodal mixture-of-experts 256k-context reasoning image-understanding

model releaseJuly 4, 2026

Mistral releases Leanstral 1.5: 119B parameter open-source model for Lean 4 proof assistance

Mistral AI has released Leanstral 1.5, an open-source 119B parameter mixture-of-experts model designed specifically for Lean 4 proof assistance. The model features 128 experts with 4 active per token (6.5B activated parameters), a 256k token context window, and multimodal input capabilities.

model releaseJune 29, 2026

DeepReinforce Releases Ornith-1.0, Open-Source Agentic Coding Model in 9B to 397B Sizes

DeepReinforce has released Ornith-1.0, an MIT-licensed model designed for agentic coding tasks with variants ranging from 9B to 397B parameters. Built on top of Apache 2.0-licensed Gemma 4 and Qwen 3.5 base models, the company claims it achieves state-of-the-art performance among open-source models of comparable size on coding benchmarks.

model releaseJune 27, 2026

DeepSeek Releases V4-Pro with 1.6T Parameters, 1M Token Context at 27% Inference Cost of V3

DeepSeek has released two Mixture-of-Experts models: V4-Pro with 1.6 trillion parameters (49B activated) and V4-Flash with 284B parameters (13B activated), both supporting 1 million token context windows. V4-Pro requires only 27% of inference FLOPs and 10% of KV cache compared to V3.2 at 1M token context, trained on over 32 trillion tokens.

model releaseJuly 1, 2026

Portugal releases Amália, open-source 9B parameter AI model trained on European Portuguese

Portugal has released Amália, its first national AI model trained specifically for European Portuguese. Built on EuroLLM-9B with 9 billion parameters, the model is fully open-source with weights, datasets, and code published under an open license. The government has committed €5.5m in initial funding through 2027.