Google DeepMind releases Gemma 4: open models ranking #3 and #6 on Arena AI leaderboard
Google DeepMind released Gemma 4, a family of four open models ranging from 2B to 31B parameters, all licensed under Apache 2.0. The 31B dense model ranks #3 on Arena AI's text leaderboard and the 26B mixture-of-experts variant ranks #6, outperforming closed models significantly larger in size.
Gemma 4 31B Dense — Quick Specs
Google DeepMind Releases Gemma 4 Open Model Family
Google DeepMind today announced Gemma 4, a family of open-source models designed for advanced reasoning and agentic workflows. The release includes four variants: Effective 2B (E2B), Effective 4B (E4B), 26B Mixture of Experts (MoE), and 31B Dense.
Performance and Benchmarks
The 31B dense model currently ranks #3 on Arena AI's text leaderboard, with the 26B MoE variant at #6. According to Google DeepMind, the 26B model outcompetes models 20x its size. Both models were built using the same underlying research and technology as Gemini 3.
Model Specifications
Large Models:
- 31B Dense: Optimized for maximum quality and fine-tuning, runs on single 80GB NVIDIA H100 GPUs in bfloat16
- 26B Mixture of Experts: Activates only 3.8 billion parameters during inference for low-latency token generation
- Context window: Up to 256K tokens
Edge Models:
- E4B and E2B: Engineered for mobile and IoT devices with native audio input and multimodal support
- Context window: 128K tokens
- Designed to run completely offline on Android devices, Raspberry Pi, NVIDIA Jetson Orin Nano, and other edge hardware
Capabilities
All Gemma 4 models include:
- Advanced multi-step reasoning and planning
- Native function-calling and structured JSON output for agentic workflows
- High-quality code generation with offline capability
- Native vision and audio processing (video, images, variable resolutions, OCR, chart understanding)
- Training on 140+ languages
- Variable resolution image processing and speech recognition (E2B/E4B)
Licensing and Distribution
Gemma 4 is released under Apache 2.0, a commercially permissive open-source license. The models are available immediately via Hugging Face, Kaggle, and Ollama. Google DeepMind claims developers have downloaded previous Gemma versions over 400 million times, with more than 100,000 community variants created.
Integration and Tools
Day-one support includes compatibility with Hugging Face Transformers, llama.cpp, Ollama, vLLM, NVIDIA NIM, LiteRT-LM, MLX, LM Studio, Unsloth, and SGLang. For Android development, models are available through Android Studio's Agent Mode and the ML Kit GenAI Prompt API. Cloud deployment options include Google Cloud's Vertex AI, Cloud Run, GKE, and TPU-accelerated serving.
Development Collaboration
Google DeepMind collaborated with Qualcomm Technologies and MediaTek on the edge models. Previous Gemma fine-tuning efforts cited include BgGPT (Bulgarian language model by INSAIT) and Cell2Sentence-Scale (Yale University cancer research application).
What This Means
Gemma 4 represents a significant efficiency milestone: achieving near-frontier reasoning performance at smaller parameter counts reduces the hardware barrier for researchers and developers building production AI systems. The Apache 2.0 licensing removes commercial restrictions that hampered earlier open models, and multimodal edge capabilities (E2B/E4B) enable on-device AI without cloud dependency. The models' Arena AI rankings suggest measurable performance gains over comparable-sized open models, though competitive positioning against Meta's Llama and other recent releases remains to be independently verified. For enterprises prioritizing data sovereignty and offline inference, Gemma 4 addresses a concrete operational requirement.
Related Articles
Google DeepMind releases Gemma 4 with 4 model sizes, 256K context, and multimodal reasoning
Google DeepMind released Gemma 4, a family of open-weights multimodal models in four sizes: E2B (2.3B effective), E4B (4.5B effective), 26B A4B (3.8B active), and 31B (30.7B parameters). All models support text and image input with 128K-256K context windows, while E2B and E4B add native audio capabilities and reasoning modes across 140+ languages.
Google DeepMind releases Gemma 4 open models with multimodal capabilities and 256K context window
Google DeepMind released the Gemma 4 family of open-source models with multimodal capabilities (text, image, audio, video) and context windows up to 256K tokens. Four distinct model sizes—E2B (2.3B effective parameters), E4B (4.5B effective), 26B A4B (3.8B active), and 31B—are available under the Apache 2.0 license, with instruction-tuned and pre-trained variants.
Google DeepMind releases Gemma 4: multimodal models up to 31B parameters with 256K context
Google DeepMind released the Gemma 4 family of open-weights multimodal models in four sizes: E2B (2.3B effective), E4B (4.5B effective), 26B A4B (25.2B total, 3.8B active), and 31B dense. All models support text and image input with 128K-256K context windows, reasoning modes, and native function calling for agentic workflows.
Google releases Gemma 4 family under Apache 2.0 license with 2B to 31B models
Google has released Gemma 4, a family of four open models ranging from 2B to 31B parameters, now available under the Apache 2.0 license for the first time. The 31B dense model ranks 3rd on the Arena AI Text Leaderboard, while the 26B mixture-of-experts variant ranks 6th, both outperforming significantly larger competitors. All models support multimodal inputs and are available on Hugging Face, Kaggle, and Ollama.
Comments
Loading...