model release

Google releases Gemma 4 family with 31B model, 256K context, multimodal capabilities

TL;DR

Google DeepMind released the Gemma 4 family of open-weights models ranging from 2.3B to 31B parameters, featuring up to 256K token context windows and native support for text, image, video, and audio inputs. The flagship 31B model scores 85.2% on MMLU Pro and 89.2% on AIME 2026, with a smaller 26B MoE variant requiring only 3.8B active parameters for faster inference.

April 2, 2026 · 5:05 PM2 min read

Gemma 4 31B Instruct — Quick Specs

Context window262K tokens

Compare Gemma 4 31B Instruct with other models →

Google Releases Gemma 4 Family with Multimodal Capabilities and Up to 256K Context

Google DeepMind launched Gemma 4, a family of open-weights models ranging from 2.3B to 31B parameters, introducing multimodal capabilities including text, image, video, and audio processing alongside native reasoning modes.

Model Sizes and Architecture

The release includes four model variants:

E2B: 2.3B effective parameters (5.1B with embeddings), 128K context
E4B: 4.5B effective parameters (8B with embeddings), 128K context
26B A4B: 25.2B total parameters with 3.8B active (MoE), 256K context
31B: 30.7B parameters, 256K context

All models employ a hybrid attention mechanism combining local sliding window attention with full global attention. The architecture uses Per-Layer Embeddings (PLE) in smaller models to optimize on-device deployment, while the 26B variant uses Mixture-of-Experts with 8 active experts from 128 total.

Capabilities and Features

Gemma 4 models support:

Multimodal Input: Text, images with variable aspect ratios and resolutions (all models), video frame processing, and native audio for E2B/E4B
Reasoning Modes: Configurable thinking modes enabling step-by-step reasoning before generation
Extended Context: 128K tokens for E2B/E4B, 256K for larger models
Function Calling: Native structured tool use for agentic workflows
Multilingual Support: 140+ languages in pre-training, 35+ in production
Audio Processing: ASR and speech-to-translation on E2B and E4B only
System Prompt Support: Native support for system role in conversations

Benchmark Performance

The 31B model achieves:

MMLU Pro: 85.2%
AIME 2026 (no tools): 89.2%
LiveCodeBench v6: 80.0%
Codeforces ELO: 2150
GPQA Diamond: 84.3%
Vision MMMU Pro: 76.9%
MATH-Vision: 85.6%

The 26B A4B MoE variant scores 82.6% on MMLU Pro and 88.3% on AIME 2026 while requiring significantly less compute due to sparse activation. Smaller E4B and E2B models score 69.4% and 60.0% on MMLU Pro respectively, suitable for on-device deployment.

Deployment and Licensing

All models are available under Apache 2.0 license through Hugging Face. The diverse size range targets deployment scenarios from mobile and edge devices (E2B/E4B) to consumer GPUs, workstations, and servers (26B/31B). Models can be loaded using the latest version of Hugging Face Transformers library with single-line calls to AutoProcessor and AutoModelForCausalLM.

Google emphasizes efficient on-device execution for smaller variants, with E2B and E4B specifically optimized for laptops and phones. Vision encoder parameters total ~150M (E2B/E4B) and ~550M (larger models), while audio encoders add ~300M parameters to smaller variants.

What This Means

Gemma 4 represents Google's commitment to open-weights multimodal models across the size spectrum. The MoE variant offers a compelling middle ground—matching dense 31B reasoning performance at 4B-parameter inference speed. For on-device deployment, E2B/E4B with native audio support fill a gap between pure language models and larger multimodal systems. Benchmark improvements in coding (Codeforces ELO 2150 vs. Gemma 3's 110) and reasoning tasks position these as competitive with closed-source alternatives, though pricing and hardware requirements differ significantly from API-based competitors.

Source: huggingface.co ↗

google-deepmind gemma-4 open-weights multimodal text-generation vision audio reasoning

model releaseJuly 4, 2026

Mistral releases Leanstral 1.5: 119B parameter open-source model for Lean 4 proof assistance

Mistral AI has released Leanstral 1.5, an open-source 119B parameter mixture-of-experts model designed specifically for Lean 4 proof assistance. The model features 128 experts with 4 active per token (6.5B activated parameters), a 256k token context window, and multimodal input capabilities.

model releaseJuly 1, 2026

Portugal releases Amália, open-source 9B parameter AI model trained on European Portuguese

Portugal has released Amália, its first national AI model trained specifically for European Portuguese. Built on EuroLLM-9B with 9 billion parameters, the model is fully open-source with weights, datasets, and code published under an open license. The government has committed €5.5m in initial funding through 2027.

model releaseJune 30, 2026

Google launches Gemini 3.1 Flash Lite Image with 4-second generation time, $0.25 per 1M input tokens

Google has released Gemini 3.1 Flash Lite Image, a text-to-image model that generates 1K resolution images in approximately 4 seconds — 2.7× faster than Gemini 3.1 Flash Image. The model is priced at $0.25 per 1M input tokens and $1.50 per 1M output tokens, with a 66K context window and knowledge cutoff of January 2025.