model release

Google releases Gemma 4 family under Apache 2.0 license with 2B to 31B models

TL;DR

Google has released Gemma 4, a family of four open models ranging from 2B to 31B parameters, now available under the Apache 2.0 license for the first time. The 31B dense model ranks 3rd on the Arena AI Text Leaderboard, while the 26B mixture-of-experts variant ranks 6th, both outperforming significantly larger competitors. All models support multimodal inputs and are available on Hugging Face, Kaggle, and Ollama.

2 min read
0

Google has released Gemma 4, its most capable open model family, marking a significant licensing shift: all four models now ship under the commercially permissive Apache 2.0 license, replacing the restrictive Google proprietary license used for earlier Gemma versions.

Model Lineup and Specifications

The Gemma 4 family consists of four variants:

  • E2B and E4B: Effective 2B and 4B parameter models optimized for edge devices (smartphones, Raspberry Pi, Jetson Orin Nano). Both support 128K token context windows and natively handle images, video, and audio input.
  • 26B MoE (Mixture-of-Experts): 3.8 billion active parameters with up to 256K token context. Designed for latency-optimized inference.
  • 31B Dense: Maximum quality variant with up to 256K token context, intended as a foundation model for fine-tuning.

All models are multimodal, with the larger variants supporting vision inputs. The architecture is based on the same technology powering Google's proprietary Gemini 3.

Performance and Benchmarks

The 31B model currently ranks 3rd among all open models on the Arena AI Text Leaderboard with a score above 1,440 Elo, while the 26B MoE ranks 6th. According to Artificial Analysis, on the GPQA Diamond benchmark for scientific reasoning:

  • 31B model: 85.7% in reasoning mode—second-best among open models under 40B parameters (behind Qwen3.5 27B at 85.8%)
  • 26B MoE model: 79.2%—ahead of OpenAI's gpt-oss-120B (76.2%)

Google claims the 31B model outperforms models 20 times its size. The evaluation requires approximately 1.2 million output tokens, less compute than comparable competitors.

Hardware Deployment and Framework Support

The unquantized bfloat16 weights of the 31B model fit on a single 80GB NVIDIA H100 GPU, with quantized versions suitable for consumer graphics cards. The 26B MoE model activates only 3.8 billion parameters during inference for efficient token generation.

Gemma 4 supports deployment across:

  • Hardware: NVIDIA (Jetson Orin Nano through Blackwell), AMD (ROCm), and Google TPUs (Trillium, Ironwood)
  • Frameworks: Hugging Face Transformers, vLLM, llama.cpp, MLX, Ollama, NVIDIA NIM/NeMo, LM Studio, Unsloth, SGLang, and Keras
  • Platforms: Hugging Face, Kaggle, Ollama, Google AI Studio (31B/26B), and Google AI Edge Gallery (E4B/E2B)

Production Capabilities

All models natively support function calling, structured JSON output, and system instructions for agentic workflows. Fine-tuning is available through Google Colab, Vertex AI, or local GPUs. Production deployments scale via Google Cloud (Vertex AI, Cloud Run, GKE).

What This Means

The Apache 2.0 licensing removes commercial restrictions that previously limited Gemma adoption, making these models viable for proprietary applications and enterprise use without derivative work constraints. The 31B model's 3rd-place ranking on Arena AI—achieved with one-tenth the parameters of many competitors—demonstrates that parameter efficiency, not scale, now determines practical performance. For edge computing, the E2B/E4B variants with multimodal support directly compete with mobile-optimized models from Mistral and others. The shift signals Google's commitment to competing in open models while maintaining architectural parity with Gemini 3.

Related Articles

model release

Google DeepMind releases Gemma 4 with 4 model sizes, 256K context, and multimodal reasoning

Google DeepMind released Gemma 4, a family of open-weights multimodal models in four sizes: E2B (2.3B effective), E4B (4.5B effective), 26B A4B (3.8B active), and 31B (30.7B parameters). All models support text and image input with 128K-256K context windows, while E2B and E4B add native audio capabilities and reasoning modes across 140+ languages.

model release

Google DeepMind releases Gemma 4 open models with multimodal capabilities and 256K context window

Google DeepMind released the Gemma 4 family of open-source models with multimodal capabilities (text, image, audio, video) and context windows up to 256K tokens. Four distinct model sizes—E2B (2.3B effective parameters), E4B (4.5B effective), 26B A4B (3.8B active), and 31B—are available under the Apache 2.0 license, with instruction-tuned and pre-trained variants.

model release

Google DeepMind releases Gemma 4: open models ranking #3 and #6 on Arena AI leaderboard

Google DeepMind released Gemma 4, a family of four open models ranging from 2B to 31B parameters, all licensed under Apache 2.0. The 31B dense model ranks #3 on Arena AI's text leaderboard and the 26B mixture-of-experts variant ranks #6, outperforming closed models significantly larger in size.

model release

Google releases Gemma 4 31B with 256K context and configurable reasoning mode

Google DeepMind has released Gemma 4 31B, a 30.7-billion-parameter multimodal model supporting text and image input. The model features a 262,144-token context window, configurable thinking/reasoning mode, native function calling, and multilingual support across 140+ languages under Apache 2.0 license.

Comments

Loading...