model release

Google launches Gemma 4 open-weights models with Apache 2.0 license to compete with Chinese LLMs

TL;DR

Google released Gemma 4, a new line of open-weights models available in sizes from 2 billion to 31 billion parameters, under a permissive Apache 2.0 license. The release includes multimodal capabilities, support for 140+ languages, native function calling, and a 256,000-token context window for the larger variants.

3 min read
0

Google Launches Gemma 4 Open-Weights Models with Apache 2.0 License

Google released Gemma 4, a new family of open-weights large language models designed to compete directly with Chinese open-source models from Moonshot AI, Alibaba, and Z.AI that increasingly rival proprietary alternatives. The shift to a permissive Apache 2.0 license marks Google's most significant licensing change for the Gemma family, removing previous restrictions that gave Google the right to terminate access.

Model Lineup and Specifications

Gemma 4 comes in multiple sizes across three categories:

High-performance dense model: A 31-billion-parameter model tuned for output quality, featuring a 256,000-token context window. Google claims it runs unquantized at 16-bit precision on a single 80 GB H100 GPU and at 4-bit precision on consumer GPUs like the Nvidia RTX 4090 or AMD RX 7900 XTX using frameworks such as Llama.cpp or Ollama.

Mixture of Experts variant: A 26-billion-parameter model using a mixture of experts (MoE) architecture with 3.8 billion active parameters per token. The model prioritizes inference speed over output quality and also features a 256,000-token context window.

Edge models: Two smaller models optimized for smartphones and single-board computers like Raspberry Pi, with 2-billion and 4-billion effective parameters (5.1 and 8 billion actual parameters, respectively, using per-layer embeddings). These retain 128,000-token context windows and multimodal capabilities.

Key Capabilities

All Gemma 4 variants support:

  • Multimodality: Video, audio, and image inputs alongside text
  • Multilingual support: Over 140 languages
  • Native function calling: Structured output generation
  • Advanced reasoning: Improvements in mathematical and instruction-following tasks

Google provides benchmark comparisons against Gemma 3 showing "significant performance improvements across a variety of AI benchmarks," though specific scores were not disclosed in the announcement.

Licensing and Deployment Strategy

The shift from Google's previous custom license to Apache 2.0 removes restrictions on deployment scenarios and eliminates Google's ability to revoke access. This addresses enterprise concerns about vendor control and data sovereignty—a critical differentiator against proprietary models where training data usage remains opaque.

Gemma 4 models are immediately available through:

  • Google AI Studio
  • Google AI Edge Gallery
  • Hugging Face
  • Kaggle
  • Ollama

Google claims day-one support across 12+ inference frameworks including vLLM, SGLang, Llama.cpp, and MLX.

Market Context

The release directly responds to the emergence of competitive open-weights Chinese models. Models like Moonshot AI's offerings and Alibaba's implementations now reportedly match or exceed the capabilities of OpenAI's GPT-5 and Anthropic's Claude on certain benchmarks. By offering a domestic alternative with clear licensing terms, Google aims to secure enterprise adoption where data residency, cost sensitivity, and licensing flexibility drive decision-making.

The 31-billion-parameter ceiling positions Gemma 4 below Google's proprietary Gemini models, eliminating cannibalization risk while remaining accessible to enterprises that cannot afford the infrastructure costs of larger models.

What This Means

Gemma 4 represents Google's strategic shift toward open-weights licensing as a competitive moat against both proprietary competitors and Chinese open-source alternatives. The Apache 2.0 license removes the licensing friction that previously made enterprises cautious about adopting Google's models. For developers and enterprises, the multimodal support, code-optimized variants, and 256K context window address two critical use cases: local code assistants and agentic AI. However, Google has not disclosed specific performance benchmarks, making it difficult to assess quality claims against established competitors.

Related Articles

model release

Mistral releases Leanstral 1.5: 119B parameter open-source model for Lean 4 proof assistance

Mistral AI has released Leanstral 1.5, an open-source 119B parameter mixture-of-experts model designed specifically for Lean 4 proof assistance. The model features 128 experts with 4 active per token (6.5B activated parameters), a 256k token context window, and multimodal input capabilities.

model release

Portugal releases Amália, open-source 9B parameter AI model trained on European Portuguese

Portugal has released Amália, its first national AI model trained specifically for European Portuguese. Built on EuroLLM-9B with 9 billion parameters, the model is fully open-source with weights, datasets, and code published under an open license. The government has committed €5.5m in initial funding through 2027.

model release

Google launches Gemini 3.1 Flash Lite Image with 4-second generation time, $0.25 per 1M input tokens

Google has released Gemini 3.1 Flash Lite Image, a text-to-image model that generates 1K resolution images in approximately 4 seconds — 2.7× faster than Gemini 3.1 Flash Image. The model is priced at $0.25 per 1M input tokens and $1.50 per 1M output tokens, with a 66K context window and knowledge cutoff of January 2025.

model release

Claude Sonnet 5 launches on AWS Bedrock with Opus-level intelligence at Sonnet pricing

Anthropic has released Claude Sonnet 5 on Amazon Bedrock and Claude Platform on AWS. The model delivers what Anthropic describes as near-Opus intelligence while maintaining Sonnet-tier pricing, with promotional rates available through August 31, 2026.

Comments

Loading...