Google launches Gemma 4 open-weights models with Apache 2.0 license to compete with Chinese LLMs
Google released Gemma 4, a new line of open-weights models available in sizes from 2 billion to 31 billion parameters, under a permissive Apache 2.0 license. The release includes multimodal capabilities, support for 140+ languages, native function calling, and a 256,000-token context window for the larger variants.
Google Launches Gemma 4 Open-Weights Models with Apache 2.0 License
Google released Gemma 4, a new family of open-weights large language models designed to compete directly with Chinese open-source models from Moonshot AI, Alibaba, and Z.AI that increasingly rival proprietary alternatives. The shift to a permissive Apache 2.0 license marks Google's most significant licensing change for the Gemma family, removing previous restrictions that gave Google the right to terminate access.
Model Lineup and Specifications
Gemma 4 comes in multiple sizes across three categories:
High-performance dense model: A 31-billion-parameter model tuned for output quality, featuring a 256,000-token context window. Google claims it runs unquantized at 16-bit precision on a single 80 GB H100 GPU and at 4-bit precision on consumer GPUs like the Nvidia RTX 4090 or AMD RX 7900 XTX using frameworks such as Llama.cpp or Ollama.
Mixture of Experts variant: A 26-billion-parameter model using a mixture of experts (MoE) architecture with 3.8 billion active parameters per token. The model prioritizes inference speed over output quality and also features a 256,000-token context window.
Edge models: Two smaller models optimized for smartphones and single-board computers like Raspberry Pi, with 2-billion and 4-billion effective parameters (5.1 and 8 billion actual parameters, respectively, using per-layer embeddings). These retain 128,000-token context windows and multimodal capabilities.
Key Capabilities
All Gemma 4 variants support:
- Multimodality: Video, audio, and image inputs alongside text
- Multilingual support: Over 140 languages
- Native function calling: Structured output generation
- Advanced reasoning: Improvements in mathematical and instruction-following tasks
Google provides benchmark comparisons against Gemma 3 showing "significant performance improvements across a variety of AI benchmarks," though specific scores were not disclosed in the announcement.
Licensing and Deployment Strategy
The shift from Google's previous custom license to Apache 2.0 removes restrictions on deployment scenarios and eliminates Google's ability to revoke access. This addresses enterprise concerns about vendor control and data sovereignty—a critical differentiator against proprietary models where training data usage remains opaque.
Gemma 4 models are immediately available through:
- Google AI Studio
- Google AI Edge Gallery
- Hugging Face
- Kaggle
- Ollama
Google claims day-one support across 12+ inference frameworks including vLLM, SGLang, Llama.cpp, and MLX.
Market Context
The release directly responds to the emergence of competitive open-weights Chinese models. Models like Moonshot AI's offerings and Alibaba's implementations now reportedly match or exceed the capabilities of OpenAI's GPT-5 and Anthropic's Claude on certain benchmarks. By offering a domestic alternative with clear licensing terms, Google aims to secure enterprise adoption where data residency, cost sensitivity, and licensing flexibility drive decision-making.
The 31-billion-parameter ceiling positions Gemma 4 below Google's proprietary Gemini models, eliminating cannibalization risk while remaining accessible to enterprises that cannot afford the infrastructure costs of larger models.
What This Means
Gemma 4 represents Google's strategic shift toward open-weights licensing as a competitive moat against both proprietary competitors and Chinese open-source alternatives. The Apache 2.0 license removes the licensing friction that previously made enterprises cautious about adopting Google's models. For developers and enterprises, the multimodal support, code-optimized variants, and 256K context window address two critical use cases: local code assistants and agentic AI. However, Google has not disclosed specific performance benchmarks, making it difficult to assess quality claims against established competitors.
Related Articles
Google DeepMind releases Gemma 4 with 4 model sizes, 256K context, and multimodal reasoning
Google DeepMind released Gemma 4, a family of open-weights multimodal models in four sizes: E2B (2.3B effective), E4B (4.5B effective), 26B A4B (3.8B active), and 31B (30.7B parameters). All models support text and image input with 128K-256K context windows, while E2B and E4B add native audio capabilities and reasoning modes across 140+ languages.
Google releases Gemma 4 family with 31B model, 256K context, multimodal capabilities
Google DeepMind released the Gemma 4 family of open-weights models ranging from 2.3B to 31B parameters, featuring up to 256K token context windows and native support for text, image, video, and audio inputs. The flagship 31B model scores 85.2% on MMLU Pro and 89.2% on AIME 2026, with a smaller 26B MoE variant requiring only 3.8B active parameters for faster inference.
Google DeepMind releases Gemma 4 open models with multimodal capabilities and 256K context window
Google DeepMind released the Gemma 4 family of open-source models with multimodal capabilities (text, image, audio, video) and context windows up to 256K tokens. Four distinct model sizes—E2B (2.3B effective parameters), E4B (4.5B effective), 26B A4B (3.8B active), and 31B—are available under the Apache 2.0 license, with instruction-tuned and pre-trained variants.
Google DeepMind releases Gemma 4: multimodal models up to 31B parameters with 256K context
Google DeepMind released the Gemma 4 family of open-weights multimodal models in four sizes: E2B (2.3B effective), E4B (4.5B effective), 26B A4B (25.2B total, 3.8B active), and 31B dense. All models support text and image input with 128K-256K context windows, reasoning modes, and native function calling for agentic workflows.
Comments
Loading...