Google DeepMind releases Gemma 4 with four model sizes, up to 256K context, multimodal support

TL;DR

Google DeepMind released Gemma 4, an open-weights multimodal model family in four sizes (2.3B to 31B parameters) with context windows up to 256K tokens. All models support text and image input, with audio native to E2B and E4B variants. The Gemma 4 31B dense model scores 85.2% on MMLU Pro, 89.2% on AIME 2026, and 80.0% on LiveCodeBench—significant improvements over Gemma 3.

April 8, 2026 · 5:50 AM2 min read

Gemma 4 — Quick Specs

Context window256K tokens

Compare Gemma 4 with other models →

Google DeepMind Releases Gemma 4: Four Multimodal Models with Up to 256K Context

Google DeepMind today released Gemma 4, an open-weights model family spanning four sizes designed for deployment from mobile devices to data centers. The lineup includes the E2B (2.3B effective parameters), E4B (4.5B effective), 26B A4B (3.8B active parameters), and 31B dense models, all under Apache 2.0 licensing.

Model Specifications

Gemma 4 introduces architectural innovations including:

Dense Models:

E2B: 2.3B effective parameters (5.1B with embeddings), 128K context window, ~150M vision encoder, ~300M audio encoder
E4B: 4.5B effective parameters (8B with embeddings), 128K context window, ~150M vision encoder, ~300M audio encoder
31B: 30.7B parameters, 256K context window, ~550M vision encoder, no native audio support

Mixture-of-Experts Model:

26B A4B: 25.2B total parameters with 3.8B active (8 active experts from 128 total, plus 1 shared), 256K context window, ~550M vision encoder

Small models (E2B, E4B) employ Per-Layer Embeddings (PLE) to maximize on-device efficiency. All models use hybrid attention combining local sliding window (512-1024 tokens) with global layers, applying Proportional RoPE for long-context optimization.

Multimodal Capabilities

All four models handle text and image input with variable aspect ratio and resolution support. E2B and E4B uniquely feature native audio support for automatic speech recognition and speech-to-translated-text across multiple languages. All models support video understanding via frame sequences and offer out-of-the-box multilingual support for 140+ languages.

Core capabilities include: configurable thinking/reasoning modes, function calling for agentic workflows, code generation and correction, document/PDF parsing, OCR, and interleaved multimodal input (freely mixing text and images).

Benchmark Performance

Gemma 4 31B (instruction-tuned) achieves:

MMLU Pro: 85.2%
AIME 2026 (no tools): 89.2%
LiveCodeBench v6: 80.0%
Codeforces ELO: 2150
GPQA Diamond: 84.3%
BigBench Extra Hard: 74.4%
Vision MMMU Pro: 76.9%
MATH-Vision: 85.6%

The 26B A4B MoE variant scores 82.6% on MMLU Pro and 88.3% on AIME, delivering near-31B performance with 4B active parameters. The E4B achieves 69.4% on MMLU Pro and 52.0% on LiveCodeBench—substantial improvements over Gemma 3 27B (67.6% MMLU Pro, 29.1% LiveCodeBench).

Smaller models (E2B: 60.0% MMLU Pro, E4B: 69.4%) target on-device deployment without sacrificing reasoning capability.

Availability and Deployment

All models are available via Hugging Face with Transformers integration. Google provides inference code supporting text generation, image/video/audio processing, and reasoning modes. The diverse architecture options enable deployment across phones, laptops, edge devices, consumer GPUs, and enterprise servers.

What This Means

Gemma 4 targets the efficiency-to-capability spectrum aggressively. The E2B and E4B variants with native audio represent Google's push into on-device multimodal AI, while the 31B and 26B A4B compete directly with Meta's Llama models on reasoning benchmarks. Google's emphasis on function calling and thinking modes positions Gemma 4 for agentic workflows. The Apache 2.0 licensing ensures commercial usability, though real-world inference costs and latency data remain unreleased—critical metrics for evaluating on-device vs. cloud deployment trade-offs.

Source: huggingface.co ↗

google-deepmind gemma-4 multimodal open-weights reasoning audio-support 256k-context mixture-of-experts

model releaseJuly 6, 2026

Tencent Releases Hy3: 295B MoE Model with 256K Context and Configurable Reasoning Modes

Tencent has released Hy3, a 295-billion parameter Mixture-of-Experts model with 21 billion active parameters and a 256,000-token context window. The model features configurable reasoning modes and is available free through OpenRouter, with deployment ending July 21, 2026.

model releaseJuly 6, 2026

Nex AGI releases Nex-N2-Mini: open-source agentic MoE model with 262K context window

Nex AGI has released Nex-N2-Mini, an open-source agentic mixture-of-experts model with a 262K-token context window. The model accepts text and image inputs and is priced at $0.025 per 1M input tokens and $0.10 per 1M output tokens.

model releaseJuly 4, 2026

Mistral releases Leanstral 1.5: 119B parameter open-source model for Lean 4 proof assistance

Mistral AI has released Leanstral 1.5, an open-source 119B parameter mixture-of-experts model designed specifically for Lean 4 proof assistance. The model features 128 experts with 4 active per token (6.5B activated parameters), a 256k token context window, and multimodal input capabilities.

model releaseJuly 6, 2026

Tencent Releases Hy3: 295B-Parameter MoE Model with 21B Active Parameters at 256K Context

Tencent has released Hy3, a 295-billion parameter Mixture-of-Experts model with 21 billion active parameters and 3.8 billion MTP layer parameters. The model features a 256K context window and is released under Apache 2.0 license, with pricing not yet disclosed.

Google DeepMind releases Gemma 4 with four model sizes, up to 256K context, multimodal support

Gemma 4 — Quick Specs

Google DeepMind Releases Gemma 4: Four Multimodal Models with Up to 256K Context

Model Specifications

Multimodal Capabilities

Benchmark Performance

Availability and Deployment

What This Means

Related Articles

Tencent Releases Hy3: 295B MoE Model with 256K Context and Configurable Reasoning Modes

Nex AGI releases Nex-N2-Mini: open-source agentic MoE model with 262K context window

Mistral releases Leanstral 1.5: 119B parameter open-source model for Lean 4 proof assistance

Tencent Releases Hy3: 295B-Parameter MoE Model with 21B Active Parameters at 256K Context

Comments