model releaseGoogle DeepMind

Google DeepMind releases Gemma 4 with four models up to 31B parameters, 256K context window

TL;DR

Google DeepMind released Gemma 4, an open-weights multimodal model family in four sizes (E2B, E4B, 26B A4B, 31B) with context windows up to 256K tokens and native reasoning capabilities. The 26B A4B variant uses Mixture-of-Experts architecture with 3.8B active parameters for efficient inference. All models support text, image input and handle 140+ languages with Apache 2.0 licensing.

2 min read
0

Google DeepMind Releases Gemma 4: Four Open-Weights Models with Multimodal Capabilities and Extended Context

Google DeepMind released Gemma 4, an open-weights model family available in four sizes designed for deployment across mobile devices to servers. The largest variant, the 31B Dense model, features 30.7B parameters with a 256K token context window. A Mixture-of-Experts variant, the 26B A4B, uses only 3.8B active parameters during inference while maintaining 25.2B total parameters—enabling performance approaching the 31B model with computational efficiency closer to a 4B model.

Model Specifications and Architecture

Gemma 4 includes two smaller models optimized for edge deployment: the E2B (2.3B effective parameters) and E4B (4.5B effective parameters), both featuring 128K context windows. The E-series models incorporate Per-Layer Embeddings (PLE) technology, where each decoder layer maintains its own token embedding table for memory efficiency during on-device inference.

All models employ hybrid attention mechanisms combining local sliding-window attention (512 tokens for E-series, 1024 for larger models) with full global attention in final layers. Proportional RoPE (p-RoPE) optimization reduces memory footprint for extended contexts.

Multimodal Capabilities

Gemma 4 handles text and image input with variable aspect ratio and resolution support across all four models. The E2B and E4B models additionally support audio, enabling automatic speech recognition and speech-to-translated-text translation across multiple languages. Video understanding is available through frame-sequence processing. All models feature native system prompt support and function-calling capabilities for structured tool use in agentic workflows.

Benchmark Performance

On MMLU Pro, the 31B Dense model scores 85.2%, compared to 82.6% for the 26B A4B variant. On coding tasks (LiveCodeBench v6), the 31B achieves 80.0% versus 77.1% for 26B A4B. For long-context retrieval (MRCR v2 8-needle at 128K), the 31B reaches 66.4% accuracy.

The smaller E4B model scores 69.4% on MMLU Pro and 52.0% on LiveCodeBench v6. Vision capabilities show the 31B at 76.9% on MMMU Pro and the E2B at 44.2%.

All instruction-tuned variants feature configurable thinking modes for step-by-step reasoning. The 31B model achieves 89.2% on AIME 2026 without tools, substantially above Gemma 3 27B (20.8%).

Multilingual and Deployment

Gemma 4 models maintain multilingual support across 140+ languages with dedicated pre-training for 35+ languages. Models are available under Apache 2.0 licensing, compatible with latest Transformers library versions.

Pricing and API availability were not disclosed in the announcement. Models are immediately accessible via Hugging Face and GitHub for local deployment.

What This Means

Gemma 4 significantly advances Google's open-model strategy by delivering multimodal capabilities competitive with frontier models at multiple size tiers. The 26B A4B's MoE architecture is particularly notable—achieving 82.6% MMLU Pro with only 3.8B active parameters challenges the assumption that parameter count directly determines inference cost. For developers, the range from E2B to 31B provides genuine deployment flexibility from mobile devices to servers. The 256K context and native reasoning support address two key limitations in previous Gemma releases, though benchmark improvements over Gemma 3 are modest on most tasks except long-context retrieval and coding.

Related Articles

model release

Google releases Gemini Omni Flash video generation model with conversational editing, withholds speech synthesis

Google DeepMind released Gemini Omni Flash, the first model in its new Omni family that generates and edits video from image, audio, video, and text inputs. The model is rolling out to Gemini app subscribers and YouTube Shorts with a 10-second clip limit, while speech-editing capabilities remain withheld pending safety testing.

model release

Google releases Gemini 3.5 Flash with 4x faster output and agentic capabilities, 3.5 Pro coming June

Google released Gemini 3.5 Flash today with 4x faster output token generation than competing frontier models while surpassing Gemini 3.1 Pro on coding, agentic, and multimodal benchmarks. The company announced Gemini 3.5 Pro will launch next month and introduced Gemini Omni, a new multimodal series that outputs video.

model release

Perceptron Launches Mk1 Vision-Language Model with Video Reasoning at $0.15/$1.50 per 1M Tokens

Perceptron has released Perceptron Mk1, a vision-language model designed for video understanding and embodied reasoning tasks. The model accepts image and video inputs with 33K context window, priced at $0.15 per 1M input tokens and $1.50 per 1M output tokens, and supports structured spatial annotations on demand.

model release

Stability AI Releases Stable Audio 3.0 Model Family Trained on Licensed Data

Stability AI has released Stable Audio 3.0, a model family for audio generation trained on fully licensed data. The company positions the release as a foundation for commercial audio applications, though specific technical specifications have not yet been disclosed.

Comments

Loading...