model releaseGoogle DeepMind

Google DeepMind releases Gemma 4 family: multimodal models from 2.3B to 31B parameters with 256K context

TL;DR

Google DeepMind released the Gemma 4 family of open-weights multimodal models in four sizes: E2B (2.3B effective parameters), E4B (4.5B effective), 26B A4B (3.8B active parameters), and 31B dense. All models support text and image input with 128K-256K context windows; E2B and E4B add native audio capabilities. Models feature reasoning modes, function calling, and multilingual support across 140+ languages.

3 min read
0

Google DeepMind releases Gemma 4: Multimodal models from 2.3B to 31B parameters with up to 256K context

Google DeepMind released the Gemma 4 family of open-weights models across four distinct sizes, each optimized for different deployment scenarios from mobile devices to server infrastructure.

Model Lineup and Architecture

The release includes:

  • Gemma 4 E2B: 2.3B effective parameters (5.1B with embeddings), 128K context window, supports text, image, and audio
  • Gemma 4 E4B: 4.5B effective parameters (8B with embeddings), 128K context window, text, image, and audio
  • Gemma 4 26B A4B: 25.2B total parameters with 3.8B active parameters (Mixture-of-Experts), 256K context window, text and image
  • Gemma 4 31B: 30.7B parameters, 256K context window, text and image

The smaller E-series models use Per-Layer Embeddings (PLE) to reduce effective parameter count while maintaining capacity. The 26B A4B employs a Mixture-of-Experts architecture with 128 total experts, activating only 8 per token during inference, enabling fast execution comparable to a 4B model.

All models use hybrid attention combining sliding-window local attention with global attention in final layers, optimized with Proportional RoPE for long-context efficiency.

Multimodal Capabilities

All four models handle text and image input with variable aspect ratio and resolution support. E2B and E4B additionally feature native audio processing for automatic speech recognition and speech-to-translated-text across multiple languages. E4B and E2B include dedicated audio encoders (~300M parameters each).

Core capabilities include: reasoning with configurable thinking modes, function calling for agentic workflows, video understanding via frame sequences, document/PDF parsing, OCR across 140+ languages, and code generation.

Benchmark Performance

Instructino-tuned variant results against instruction-tuned baselines:

Reasoning and Coding:

  • MMLU Pro: E2B 60.0% | E4B 69.4% | 26B A4B 82.6% | 31B 85.2%
  • AIME 2026 (no tools): E2B 37.5% | E4B 42.5% | 26B A4B 88.3% | 31B 89.2%
  • LiveCodeBench v6: E2B 44.0% | E4B 52.0% | 26B A4B 77.1% | 31B 80.0%
  • Codeforces ELO: E2B 633 | E4B 940 | 26B A4B 1718 | 31B 2150

Multimodal Vision:

  • MMMU Pro: E2B 44.2% | E4B 52.6% | 26B A4B 73.8% | 31B 76.9%
  • MATH-Vision: E2B 52.4% | E4B 59.5% | 26B A4B 82.4% | 31B 85.6%

Long Context (MRCR v2 at 128K, 8-needle average):

  • E2B 19.1% | E4B 25.4% | 26B A4B 44.1% | 31B 66.4%

Audio (E2B/E4B only):

  • CoVoST2: E4B 35.54 | E2B 33.47
  • FLEURS character error rate: E4B 0.08 | E2B 0.09

Technical Details and Licensing

All models are released under Apache 2.0 license with full source access on Hugging Face. Models support 262K vocabulary size and include native system prompt support for structured conversations. Training cutoff date and exact training data composition were not disclosed.

Integration requires Transformers library (latest version) and runs on single GPU inference via AutoModelForCausalLM and AutoModelForMultimodalLM APIs.

What this means

Gemma 4 significantly expands deployment optionality. The efficient E-series models target edge/mobile with reasonable capability trade-offs, while larger variants compete with dense competitors on reasoning benchmarks. The MoE variant offers a middle ground: competitive performance with inference speed closer to 4B-class models. The 256K context across larger models and integrated audio/vision support position Gemma 4 as a comprehensive open alternative to closed multimodal systems, though long-context performance (19-66% on needle-in-haystack tasks) suggests practical limitations remain at extreme context lengths.

Related Articles

model release

DeepSeek Releases V4 Models: 1M Context Window, 90% Less KV Cache Than V3

DeepSeek has released two new MoE models: DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated). Both models support a one million token context window and use a hybrid attention architecture that requires only 27% of single-token inference FLOPs and 10% of KV cache compared to DeepSeek-V3.2.

model release

Mistral releases Leanstral 1.5: 119B parameter open-source model for Lean 4 proof assistance

Mistral AI has released Leanstral 1.5, an open-source 119B parameter mixture-of-experts model designed specifically for Lean 4 proof assistance. The model features 128 experts with 4 active per token (6.5B activated parameters), a 256k token context window, and multimodal input capabilities.

model release

NVIDIA releases Nemotron-Labs-TwoTower-30B: block-wise diffusion model claims 2.42× faster generation at 98.7% baseline

NVIDIA released Nemotron-Labs-TwoTower-30B-A3B-Base-BF16, a block-wise diffusion language model that generates text by denoising blocks of tokens in parallel rather than sequentially. According to NVIDIA, the model achieves 2.42× the wall-clock generation throughput of its autoregressive baseline while retaining 98.7% of aggregate benchmark quality.

model release

Portugal releases Amália, open-source 9B parameter AI model trained on European Portuguese

Portugal has released Amália, its first national AI model trained specifically for European Portuguese. Built on EuroLLM-9B with 9 billion parameters, the model is fully open-source with weights, datasets, and code published under an open license. The government has committed €5.5m in initial funding through 2027.

Comments

Loading...