Google DeepMind releases Gemma 4 family with 256K context window and multimodal capabilities
Google DeepMind released the Gemma 4 family of open-weights models in four sizes (2.3B to 31B parameters) with multimodal support for text, images, video, and audio. The flagship 31B model achieves 85.2% on MMLU Pro and 89.2% on AIME 2024, with context windows up to 256K tokens. All models feature configurable reasoning modes and are optimized for deployment from mobile devices to servers under Apache 2.0 license.
Google DeepMind Releases Gemma 4 Family: Four Models from 2.3B to 31B Parameters
Google DeepMind released the complete Gemma 4 model family today, spanning four distinct sizes optimized for deployment scenarios from edge devices to enterprise servers. All models are open-weights under Apache 2.0 license.
Model Lineup and Specifications
The family includes two dense models and one mixture-of-experts variant:
Dense Models:
- E2B: 2.3B effective parameters (5.1B with embeddings), 128K context window
- E4B: 4.5B effective parameters (8B with embeddings), 128K context window
- 31B: 30.7B parameters, 256K context window
Mixture-of-Experts:
- 26B A4B: 25.2B total parameters, 3.8B active parameters, 256K context window, 8 active experts from 128 total
The smaller models use Per-Layer Embeddings (PLE) technology to achieve parameter efficiency without sacrificing capabilities. The 26B A4B model activates only 4B parameters during inference, enabling performance comparable to a 4B model with the reasoning capacity of a 26B model.
Multimodal and Reasoning Capabilities
All Gemma 4 models handle text and image input. The E2B and E4B models additionally support audio input natively. All models feature configurable thinking/reasoning modes enabling step-by-step problem solving before generating responses.
Key capabilities include: function calling for agentic workflows, variable aspect ratio and resolution image processing, video frame analysis, multilingual support (140+ languages pre-trained, 35+ supported), and native system prompt support.
Smaller models employ a hybrid attention mechanism combining local sliding window attention (512 tokens for E-series, 1024 for larger models) with full global attention in final layers to balance memory efficiency with long-context awareness.
Benchmark Performance
Benchmark results are from instruction-tuned variants:
31B Model (Dense):
- MMLU Pro: 85.2%
- AIME 2024: 89.2% (no tools)
- Codeforces ELO: 2150
- LiveCodeBench v6: 80.0%
- GPQA Diamond: 84.3%
- MMMLU (multimodal): 88.4%
- Vision MMMU Pro: 76.9%
- MATH-Vision: 85.6%
26B A4B Model (MoE):
- MMLU Pro: 82.6%
- AIME 2024: 88.3%
- Codeforces ELO: 1718
- LiveCodeBench v6: 77.1%
E4B Model:
- MMLU Pro: 69.4%
- AIME 2024: 42.5%
- Codeforces ELO: 940
- Audio CoVoST: 35.54
All models demonstrate substantial improvements in coding benchmarks and long-context reasoning compared to Gemma 3 27B baseline. The long-context test (MRCR v2 with 128K context, 8-needle) shows the 31B achieving 66.4% versus 13.5% for Gemma 3 27B.
Deployment and Availability
Models are available via Hugging Face with full Transformers library support. The architecture choices enable diverse deployment: E2B and E4B target mobile and lightweight laptop execution, 26B A4B balances speed and capability for consumer GPUs, and 31B targets workstations and servers.
Google provided code examples for loading models, processing multi-turn conversations, enabling reasoning modes, and handling audio/video/image inputs alongside text.
What This Means
Gemma 4 represents a systematic expansion of open-weights model options across size classes. The emphasis on on-device efficiency (E-series with PLE) paired with frontier reasoning performance (31B, 26B A4B) creates genuine tradeoff options. The multimodal capabilities with configurable reasoning modes and extended context windows position these models for both traditional deployment and emerging agentic applications. Pricing for commercial deployment and specific inference cost metrics remain undisclosed.
Related Articles
DeepSeek Releases V4-Pro with 1.6T Parameters, 1M Token Context at 27% Inference Cost of V3
DeepSeek has released two Mixture-of-Experts models: V4-Pro with 1.6 trillion parameters (49B activated) and V4-Flash with 284B parameters (13B activated), both supporting 1 million token context windows. V4-Pro requires only 27% of inference FLOPs and 10% of KV cache compared to V3.2 at 1M token context, trained on over 32 trillion tokens.
Mistral releases Leanstral 1.5: 119B parameter open-source model for Lean 4 proof assistance
Mistral AI has released Leanstral 1.5, an open-source 119B parameter mixture-of-experts model designed specifically for Lean 4 proof assistance. The model features 128 experts with 4 active per token (6.5B activated parameters), a 256k token context window, and multimodal input capabilities.
Google DeepMind releases Nano Banana 2 Lite at $0.034 per 1K image with 4-second generation, opens Gemini Omni Flash API
Google DeepMind released Nano Banana 2 Lite (gemini-3.1-flash-lite-image), its fastest image generation model with 4-second text-to-image latency priced at $0.034 per 1K-resolution image. The company also opened developer access to Gemini Omni Flash (gemini-omni-flash-preview) for video generation and editing at $0.10 per second of output.
DeepReinforce Releases Ornith-1.0, Open-Source Agentic Coding Model in 9B to 397B Sizes
DeepReinforce has released Ornith-1.0, an MIT-licensed model designed for agentic coding tasks with variants ranging from 9B to 397B parameters. Built on top of Apache 2.0-licensed Gemma 4 and Qwen 3.5 base models, the company claims it achieves state-of-the-art performance among open-source models of comparable size on coding benchmarks.
Comments
Loading...