Google DeepMind releases Gemma 4: open models ranking #3 and #6 on Arena AI leaderboard

TL;DR

Google DeepMind released Gemma 4, a family of four open models ranging from 2B to 31B parameters, all licensed under Apache 2.0. The 31B dense model ranks #3 on Arena AI's text leaderboard and the 26B mixture-of-experts variant ranks #6, outperforming closed models significantly larger in size.

April 2, 2026 · 4:20 PM2 min read

Gemma 4 31B Dense — Quick Specs

Context window256K tokens

Compare Gemma 4 31B Dense with other models →

Google DeepMind Releases Gemma 4 Open Model Family

Google DeepMind today announced Gemma 4, a family of open-source models designed for advanced reasoning and agentic workflows. The release includes four variants: Effective 2B (E2B), Effective 4B (E4B), 26B Mixture of Experts (MoE), and 31B Dense.

Performance and Benchmarks

The 31B dense model currently ranks #3 on Arena AI's text leaderboard, with the 26B MoE variant at #6. According to Google DeepMind, the 26B model outcompetes models 20x its size. Both models were built using the same underlying research and technology as Gemini 3.

Model Specifications

Large Models:

31B Dense: Optimized for maximum quality and fine-tuning, runs on single 80GB NVIDIA H100 GPUs in bfloat16
26B Mixture of Experts: Activates only 3.8 billion parameters during inference for low-latency token generation
Context window: Up to 256K tokens

Edge Models:

E4B and E2B: Engineered for mobile and IoT devices with native audio input and multimodal support
Context window: 128K tokens
Designed to run completely offline on Android devices, Raspberry Pi, NVIDIA Jetson Orin Nano, and other edge hardware

Capabilities

All Gemma 4 models include:

Advanced multi-step reasoning and planning
Native function-calling and structured JSON output for agentic workflows
High-quality code generation with offline capability
Native vision and audio processing (video, images, variable resolutions, OCR, chart understanding)
Training on 140+ languages
Variable resolution image processing and speech recognition (E2B/E4B)

Licensing and Distribution

Gemma 4 is released under Apache 2.0, a commercially permissive open-source license. The models are available immediately via Hugging Face, Kaggle, and Ollama. Google DeepMind claims developers have downloaded previous Gemma versions over 400 million times, with more than 100,000 community variants created.

Integration and Tools

Day-one support includes compatibility with Hugging Face Transformers, llama.cpp, Ollama, vLLM, NVIDIA NIM, LiteRT-LM, MLX, LM Studio, Unsloth, and SGLang. For Android development, models are available through Android Studio's Agent Mode and the ML Kit GenAI Prompt API. Cloud deployment options include Google Cloud's Vertex AI, Cloud Run, GKE, and TPU-accelerated serving.

Development Collaboration

Google DeepMind collaborated with Qualcomm Technologies and MediaTek on the edge models. Previous Gemma fine-tuning efforts cited include BgGPT (Bulgarian language model by INSAIT) and Cell2Sentence-Scale (Yale University cancer research application).

What This Means

Gemma 4 represents a significant efficiency milestone: achieving near-frontier reasoning performance at smaller parameter counts reduces the hardware barrier for researchers and developers building production AI systems. The Apache 2.0 licensing removes commercial restrictions that hampered earlier open models, and multimodal edge capabilities (E2B/E4B) enable on-device AI without cloud dependency. The models' Arena AI rankings suggest measurable performance gains over comparable-sized open models, though competitive positioning against Meta's Llama and other recent releases remains to be independently verified. For enterprises prioritizing data sovereignty and offline inference, Gemma 4 addresses a concrete operational requirement.

Source: deepmind.google ↗

gemma-4 google-deepmind open-source-models apache-2.0 mixture-of-experts edge-ai arena-ai multimodal

model releaseMay 20, 2026

Google releases Gemini Omni Flash video generation model with conversational editing, withholds speech synthesis

Google DeepMind released Gemini Omni Flash, the first model in its new Omni family that generates and edits video from image, audio, video, and text inputs. The model is rolling out to Gemini app subscribers and YouTube Shorts with a 10-second clip limit, while speech-editing capabilities remain withheld pending safety testing.

model releaseMay 19, 2026

Google releases Gemini 3.5 Flash with 4x faster output and agentic capabilities, 3.5 Pro coming June

Google released Gemini 3.5 Flash today with 4x faster output token generation than competing frontier models while surpassing Gemini 3.1 Pro on coding, agentic, and multimodal benchmarks. The company announced Gemini 3.5 Pro will launch next month and introduced Gemini Omni, a new multimodal series that outputs video.

product updateMay 19, 2026

Google DeepMind connects Genie world model to 280 billion Street View images, Waymo already using for self-driving train

Google DeepMind has integrated its Genie world model with Street View's 280 billion images spanning 110 countries, enabling users to explore AI-generated simulations of real locations. Waymo is already using Genie 3 to train self-driving cars on rare scenarios like tornadoes and unexpected obstacles.