Google releases Gemma 4 family under Apache 2.0 license with 2B to 31B models
Google has released Gemma 4, a family of four open models ranging from 2B to 31B parameters, now available under the Apache 2.0 license for the first time. The 31B dense model ranks 3rd on the Arena AI Text Leaderboard, while the 26B mixture-of-experts variant ranks 6th, both outperforming significantly larger competitors. All models support multimodal inputs and are available on Hugging Face, Kaggle, and Ollama.
Google has released Gemma 4, its most capable open model family, marking a significant licensing shift: all four models now ship under the commercially permissive Apache 2.0 license, replacing the restrictive Google proprietary license used for earlier Gemma versions.
Model Lineup and Specifications
The Gemma 4 family consists of four variants:
- E2B and E4B: Effective 2B and 4B parameter models optimized for edge devices (smartphones, Raspberry Pi, Jetson Orin Nano). Both support 128K token context windows and natively handle images, video, and audio input.
- 26B MoE (Mixture-of-Experts): 3.8 billion active parameters with up to 256K token context. Designed for latency-optimized inference.
- 31B Dense: Maximum quality variant with up to 256K token context, intended as a foundation model for fine-tuning.
All models are multimodal, with the larger variants supporting vision inputs. The architecture is based on the same technology powering Google's proprietary Gemini 3.
Performance and Benchmarks
The 31B model currently ranks 3rd among all open models on the Arena AI Text Leaderboard with a score above 1,440 Elo, while the 26B MoE ranks 6th. According to Artificial Analysis, on the GPQA Diamond benchmark for scientific reasoning:
- 31B model: 85.7% in reasoning mode—second-best among open models under 40B parameters (behind Qwen3.5 27B at 85.8%)
- 26B MoE model: 79.2%—ahead of OpenAI's gpt-oss-120B (76.2%)
Google claims the 31B model outperforms models 20 times its size. The evaluation requires approximately 1.2 million output tokens, less compute than comparable competitors.
Hardware Deployment and Framework Support
The unquantized bfloat16 weights of the 31B model fit on a single 80GB NVIDIA H100 GPU, with quantized versions suitable for consumer graphics cards. The 26B MoE model activates only 3.8 billion parameters during inference for efficient token generation.
Gemma 4 supports deployment across:
- Hardware: NVIDIA (Jetson Orin Nano through Blackwell), AMD (ROCm), and Google TPUs (Trillium, Ironwood)
- Frameworks: Hugging Face Transformers, vLLM, llama.cpp, MLX, Ollama, NVIDIA NIM/NeMo, LM Studio, Unsloth, SGLang, and Keras
- Platforms: Hugging Face, Kaggle, Ollama, Google AI Studio (31B/26B), and Google AI Edge Gallery (E4B/E2B)
Production Capabilities
All models natively support function calling, structured JSON output, and system instructions for agentic workflows. Fine-tuning is available through Google Colab, Vertex AI, or local GPUs. Production deployments scale via Google Cloud (Vertex AI, Cloud Run, GKE).
What This Means
The Apache 2.0 licensing removes commercial restrictions that previously limited Gemma adoption, making these models viable for proprietary applications and enterprise use without derivative work constraints. The 31B model's 3rd-place ranking on Arena AI—achieved with one-tenth the parameters of many competitors—demonstrates that parameter efficiency, not scale, now determines practical performance. For edge computing, the E2B/E4B variants with multimodal support directly compete with mobile-optimized models from Mistral and others. The shift signals Google's commitment to competing in open models while maintaining architectural parity with Gemini 3.
Related Articles
Mistral releases Leanstral 1.5: 119B parameter open-source model for Lean 4 proof assistance
Mistral AI has released Leanstral 1.5, an open-source 119B parameter mixture-of-experts model designed specifically for Lean 4 proof assistance. The model features 128 experts with 4 active per token (6.5B activated parameters), a 256k token context window, and multimodal input capabilities.
Portugal releases Amália, open-source 9B parameter AI model trained on European Portuguese
Portugal has released Amália, its first national AI model trained specifically for European Portuguese. Built on EuroLLM-9B with 9 billion parameters, the model is fully open-source with weights, datasets, and code published under an open license. The government has committed €5.5m in initial funding through 2027.
Google launches Gemini 3.1 Flash Lite Image with 4-second generation time, $0.25 per 1M input tokens
Google has released Gemini 3.1 Flash Lite Image, a text-to-image model that generates 1K resolution images in approximately 4 seconds — 2.7× faster than Gemini 3.1 Flash Image. The model is priced at $0.25 per 1M input tokens and $1.50 per 1M output tokens, with a 66K context window and knowledge cutoff of January 2025.
DeepReinforce Releases Ornith-1.0, Open-Source Agentic Coding Model in 9B to 397B Sizes
DeepReinforce has released Ornith-1.0, an MIT-licensed model designed for agentic coding tasks with variants ranging from 9B to 397B parameters. Built on top of Apache 2.0-licensed Gemma 4 and Qwen 3.5 base models, the company claims it achieves state-of-the-art performance among open-source models of comparable size on coding benchmarks.
Comments
Loading...