Google launches Gemma 4 open-weights models with Apache 2.0 license to compete with Chinese LLMs
Google released Gemma 4, a new line of open-weights models available in sizes from 2 billion to 31 billion parameters, under a permissive Apache 2.0 license. The release includes multimodal capabilities, support for 140+ languages, native function calling, and a 256,000-token context window for the larger variants.
Google Launches Gemma 4 Open-Weights Models with Apache 2.0 License
Google released Gemma 4, a new family of open-weights large language models designed to compete directly with Chinese open-source models from Moonshot AI, Alibaba, and Z.AI that increasingly rival proprietary alternatives. The shift to a permissive Apache 2.0 license marks Google's most significant licensing change for the Gemma family, removing previous restrictions that gave Google the right to terminate access.
Model Lineup and Specifications
Gemma 4 comes in multiple sizes across three categories:
High-performance dense model: A 31-billion-parameter model tuned for output quality, featuring a 256,000-token context window. Google claims it runs unquantized at 16-bit precision on a single 80 GB H100 GPU and at 4-bit precision on consumer GPUs like the Nvidia RTX 4090 or AMD RX 7900 XTX using frameworks such as Llama.cpp or Ollama.
Mixture of Experts variant: A 26-billion-parameter model using a mixture of experts (MoE) architecture with 3.8 billion active parameters per token. The model prioritizes inference speed over output quality and also features a 256,000-token context window.
Edge models: Two smaller models optimized for smartphones and single-board computers like Raspberry Pi, with 2-billion and 4-billion effective parameters (5.1 and 8 billion actual parameters, respectively, using per-layer embeddings). These retain 128,000-token context windows and multimodal capabilities.
Key Capabilities
All Gemma 4 variants support:
- Multimodality: Video, audio, and image inputs alongside text
- Multilingual support: Over 140 languages
- Native function calling: Structured output generation
- Advanced reasoning: Improvements in mathematical and instruction-following tasks
Google provides benchmark comparisons against Gemma 3 showing "significant performance improvements across a variety of AI benchmarks," though specific scores were not disclosed in the announcement.
Licensing and Deployment Strategy
The shift from Google's previous custom license to Apache 2.0 removes restrictions on deployment scenarios and eliminates Google's ability to revoke access. This addresses enterprise concerns about vendor control and data sovereignty—a critical differentiator against proprietary models where training data usage remains opaque.
Gemma 4 models are immediately available through:
- Google AI Studio
- Google AI Edge Gallery
- Hugging Face
- Kaggle
- Ollama
Google claims day-one support across 12+ inference frameworks including vLLM, SGLang, Llama.cpp, and MLX.
Market Context
The release directly responds to the emergence of competitive open-weights Chinese models. Models like Moonshot AI's offerings and Alibaba's implementations now reportedly match or exceed the capabilities of OpenAI's GPT-5 and Anthropic's Claude on certain benchmarks. By offering a domestic alternative with clear licensing terms, Google aims to secure enterprise adoption where data residency, cost sensitivity, and licensing flexibility drive decision-making.
The 31-billion-parameter ceiling positions Gemma 4 below Google's proprietary Gemini models, eliminating cannibalization risk while remaining accessible to enterprises that cannot afford the infrastructure costs of larger models.
What This Means
Gemma 4 represents Google's strategic shift toward open-weights licensing as a competitive moat against both proprietary competitors and Chinese open-source alternatives. The Apache 2.0 license removes the licensing friction that previously made enterprises cautious about adopting Google's models. For developers and enterprises, the multimodal support, code-optimized variants, and 256K context window address two critical use cases: local code assistants and agentic AI. However, Google has not disclosed specific performance benchmarks, making it difficult to assess quality claims against established competitors.
Related Articles
Google releases Gemini Omni Flash video generation model with conversational editing, withholds speech synthesis
Google DeepMind released Gemini Omni Flash, the first model in its new Omni family that generates and edits video from image, audio, video, and text inputs. The model is rolling out to Gemini app subscribers and YouTube Shorts with a 10-second clip limit, while speech-editing capabilities remain withheld pending safety testing.
Google releases Gemini 3.5 Flash with 4x faster output and agentic capabilities, 3.5 Pro coming June
Google released Gemini 3.5 Flash today with 4x faster output token generation than competing frontier models while surpassing Gemini 3.1 Pro on coding, agentic, and multimodal benchmarks. The company announced Gemini 3.5 Pro will launch next month and introduced Gemini Omni, a new multimodal series that outputs video.
Stability AI Releases Stable Audio 3.0 Model Family Trained on Licensed Data
Stability AI has released Stable Audio 3.0, a model family for audio generation trained on fully licensed data. The company positions the release as a foundation for commercial audio applications, though specific technical specifications have not yet been disclosed.
Google launches Gemini 3.5 Flash and new Omni multimodal AI family at I/O 2026
Google launched Gemini 3.5 Flash today as the default model for its Gemini app and AI Mode in Search, with Gemini 3.5 Pro following next month. The company also introduced Gemini Omni, a new multimodal AI family capable of generating video from text, photos, video, and audio inputs.
Comments
Loading...