Google releases Gemma 4 family with 31B model, 256K context, multimodal capabilities
Google DeepMind released the Gemma 4 family of open-weights models ranging from 2.3B to 31B parameters, featuring up to 256K token context windows and native support for text, image, video, and audio inputs. The flagship 31B model scores 85.2% on MMLU Pro and 89.2% on AIME 2026, with a smaller 26B MoE variant requiring only 3.8B active parameters for faster inference.
Gemma 4 31B Instruct — Quick Specs
Google Releases Gemma 4 Family with Multimodal Capabilities and Up to 256K Context
Google DeepMind launched Gemma 4, a family of open-weights models ranging from 2.3B to 31B parameters, introducing multimodal capabilities including text, image, video, and audio processing alongside native reasoning modes.
Model Sizes and Architecture
The release includes four model variants:
- E2B: 2.3B effective parameters (5.1B with embeddings), 128K context
- E4B: 4.5B effective parameters (8B with embeddings), 128K context
- 26B A4B: 25.2B total parameters with 3.8B active (MoE), 256K context
- 31B: 30.7B parameters, 256K context
All models employ a hybrid attention mechanism combining local sliding window attention with full global attention. The architecture uses Per-Layer Embeddings (PLE) in smaller models to optimize on-device deployment, while the 26B variant uses Mixture-of-Experts with 8 active experts from 128 total.
Capabilities and Features
Gemma 4 models support:
- Multimodal Input: Text, images with variable aspect ratios and resolutions (all models), video frame processing, and native audio for E2B/E4B
- Reasoning Modes: Configurable thinking modes enabling step-by-step reasoning before generation
- Extended Context: 128K tokens for E2B/E4B, 256K for larger models
- Function Calling: Native structured tool use for agentic workflows
- Multilingual Support: 140+ languages in pre-training, 35+ in production
- Audio Processing: ASR and speech-to-translation on E2B and E4B only
- System Prompt Support: Native support for system role in conversations
Benchmark Performance
The 31B model achieves:
- MMLU Pro: 85.2%
- AIME 2026 (no tools): 89.2%
- LiveCodeBench v6: 80.0%
- Codeforces ELO: 2150
- GPQA Diamond: 84.3%
- Vision MMMU Pro: 76.9%
- MATH-Vision: 85.6%
The 26B A4B MoE variant scores 82.6% on MMLU Pro and 88.3% on AIME 2026 while requiring significantly less compute due to sparse activation. Smaller E4B and E2B models score 69.4% and 60.0% on MMLU Pro respectively, suitable for on-device deployment.
Deployment and Licensing
All models are available under Apache 2.0 license through Hugging Face. The diverse size range targets deployment scenarios from mobile and edge devices (E2B/E4B) to consumer GPUs, workstations, and servers (26B/31B). Models can be loaded using the latest version of Hugging Face Transformers library with single-line calls to AutoProcessor and AutoModelForCausalLM.
Google emphasizes efficient on-device execution for smaller variants, with E2B and E4B specifically optimized for laptops and phones. Vision encoder parameters total ~150M (E2B/E4B) and ~550M (larger models), while audio encoders add ~300M parameters to smaller variants.
What This Means
Gemma 4 represents Google's commitment to open-weights multimodal models across the size spectrum. The MoE variant offers a compelling middle ground—matching dense 31B reasoning performance at 4B-parameter inference speed. For on-device deployment, E2B/E4B with native audio support fill a gap between pure language models and larger multimodal systems. Benchmark improvements in coding (Codeforces ELO 2150 vs. Gemma 3's 110) and reasoning tasks position these as competitive with closed-source alternatives, though pricing and hardware requirements differ significantly from API-based competitors.
Related Articles
Mistral releases Leanstral 1.5: 119B parameter open-source model for Lean 4 proof assistance
Mistral AI has released Leanstral 1.5, an open-source 119B parameter mixture-of-experts model designed specifically for Lean 4 proof assistance. The model features 128 experts with 4 active per token (6.5B activated parameters), a 256k token context window, and multimodal input capabilities.
Portugal releases Amália, open-source 9B parameter AI model trained on European Portuguese
Portugal has released Amália, its first national AI model trained specifically for European Portuguese. Built on EuroLLM-9B with 9 billion parameters, the model is fully open-source with weights, datasets, and code published under an open license. The government has committed €5.5m in initial funding through 2027.
Google launches Gemini 3.1 Flash Lite Image with 4-second generation time, $0.25 per 1M input tokens
Google has released Gemini 3.1 Flash Lite Image, a text-to-image model that generates 1K resolution images in approximately 4 seconds — 2.7× faster than Gemini 3.1 Flash Image. The model is priced at $0.25 per 1M input tokens and $1.50 per 1M output tokens, with a 66K context window and knowledge cutoff of January 2025.
Google DeepMind releases Nano Banana 2 Lite at $0.034 per 1K image with 4-second generation, opens Gemini Omni Flash API
Google DeepMind released Nano Banana 2 Lite (gemini-3.1-flash-lite-image), its fastest image generation model with 4-second text-to-image latency priced at $0.034 per 1K-resolution image. The company also opened developer access to Gemini Omni Flash (gemini-omni-flash-preview) for video generation and editing at $0.10 per second of output.
Comments
Loading...