Google releases Gemma 4 family under Apache 2.0 license with 2B to 31B models
Google has released Gemma 4, a family of four open models ranging from 2B to 31B parameters, now available under the Apache 2.0 license for the first time. The 31B dense model ranks 3rd on the Arena AI Text Leaderboard, while the 26B mixture-of-experts variant ranks 6th, both outperforming significantly larger competitors. All models support multimodal inputs and are available on Hugging Face, Kaggle, and Ollama.
Google has released Gemma 4, its most capable open model family, marking a significant licensing shift: all four models now ship under the commercially permissive Apache 2.0 license, replacing the restrictive Google proprietary license used for earlier Gemma versions.
Model Lineup and Specifications
The Gemma 4 family consists of four variants:
- E2B and E4B: Effective 2B and 4B parameter models optimized for edge devices (smartphones, Raspberry Pi, Jetson Orin Nano). Both support 128K token context windows and natively handle images, video, and audio input.
- 26B MoE (Mixture-of-Experts): 3.8 billion active parameters with up to 256K token context. Designed for latency-optimized inference.
- 31B Dense: Maximum quality variant with up to 256K token context, intended as a foundation model for fine-tuning.
All models are multimodal, with the larger variants supporting vision inputs. The architecture is based on the same technology powering Google's proprietary Gemini 3.
Performance and Benchmarks
The 31B model currently ranks 3rd among all open models on the Arena AI Text Leaderboard with a score above 1,440 Elo, while the 26B MoE ranks 6th. According to Artificial Analysis, on the GPQA Diamond benchmark for scientific reasoning:
- 31B model: 85.7% in reasoning mode—second-best among open models under 40B parameters (behind Qwen3.5 27B at 85.8%)
- 26B MoE model: 79.2%—ahead of OpenAI's gpt-oss-120B (76.2%)
Google claims the 31B model outperforms models 20 times its size. The evaluation requires approximately 1.2 million output tokens, less compute than comparable competitors.
Hardware Deployment and Framework Support
The unquantized bfloat16 weights of the 31B model fit on a single 80GB NVIDIA H100 GPU, with quantized versions suitable for consumer graphics cards. The 26B MoE model activates only 3.8 billion parameters during inference for efficient token generation.
Gemma 4 supports deployment across:
- Hardware: NVIDIA (Jetson Orin Nano through Blackwell), AMD (ROCm), and Google TPUs (Trillium, Ironwood)
- Frameworks: Hugging Face Transformers, vLLM, llama.cpp, MLX, Ollama, NVIDIA NIM/NeMo, LM Studio, Unsloth, SGLang, and Keras
- Platforms: Hugging Face, Kaggle, Ollama, Google AI Studio (31B/26B), and Google AI Edge Gallery (E4B/E2B)
Production Capabilities
All models natively support function calling, structured JSON output, and system instructions for agentic workflows. Fine-tuning is available through Google Colab, Vertex AI, or local GPUs. Production deployments scale via Google Cloud (Vertex AI, Cloud Run, GKE).
What This Means
The Apache 2.0 licensing removes commercial restrictions that previously limited Gemma adoption, making these models viable for proprietary applications and enterprise use without derivative work constraints. The 31B model's 3rd-place ranking on Arena AI—achieved with one-tenth the parameters of many competitors—demonstrates that parameter efficiency, not scale, now determines practical performance. For edge computing, the E2B/E4B variants with multimodal support directly compete with mobile-optimized models from Mistral and others. The shift signals Google's commitment to competing in open models while maintaining architectural parity with Gemini 3.
Related Articles
Google launches Gemini 3.5 Flash and new Omni multimodal AI family at I/O 2026
Google launched Gemini 3.5 Flash today as the default model for its Gemini app and AI Mode in Search, with Gemini 3.5 Pro following next month. The company also introduced Gemini Omni, a new multimodal AI family capable of generating video from text, photos, video, and audio inputs.
Google launches Gemini Omni Flash, multimodal video generation model available to AI Plus subscribers
Google has released Gemini Omni Flash, the first model in its new Gemini Omni family designed to generate video content from text, images, video, and audio inputs. The model is available now to AI Plus subscribers, with free access coming to YouTube Shorts and YouTube Create later this week.
Google Releases Gemini 3.5 Flash with 1M Token Context and Configurable Thinking Modes at $1.50/$9 Per Million Tokens
Google has released Gemini 3.5 Flash, a multimodal model with a 1 million token context window priced at $1.50 per million input tokens and $9 per million output tokens. The model supports text, image, video, audio, and PDF inputs with configurable thinking effort levels from minimal to high.
Google releases Gemini Omni Flash video generation model with conversational editing, withholds speech synthesis
Google DeepMind released Gemini Omni Flash, the first model in its new Omni family that generates and edits video from image, audio, video, and text inputs. The model is rolling out to Gemini app subscribers and YouTube Shorts with a 10-second clip limit, while speech-editing capabilities remain withheld pending safety testing.
Comments
Loading...