model releaseGoogle DeepMind

Google DeepMind releases Gemma 4: open models ranking #3 and #6 on Arena AI leaderboard

TL;DR

Google DeepMind released Gemma 4, a family of four open models ranging from 2B to 31B parameters, all licensed under Apache 2.0. The 31B dense model ranks #3 on Arena AI's text leaderboard and the 26B mixture-of-experts variant ranks #6, outperforming closed models significantly larger in size.

2 min read
0

Google DeepMind Releases Gemma 4 Open Model Family

Google DeepMind today announced Gemma 4, a family of open-source models designed for advanced reasoning and agentic workflows. The release includes four variants: Effective 2B (E2B), Effective 4B (E4B), 26B Mixture of Experts (MoE), and 31B Dense.

Performance and Benchmarks

The 31B dense model currently ranks #3 on Arena AI's text leaderboard, with the 26B MoE variant at #6. According to Google DeepMind, the 26B model outcompetes models 20x its size. Both models were built using the same underlying research and technology as Gemini 3.

Model Specifications

Large Models:

  • 31B Dense: Optimized for maximum quality and fine-tuning, runs on single 80GB NVIDIA H100 GPUs in bfloat16
  • 26B Mixture of Experts: Activates only 3.8 billion parameters during inference for low-latency token generation
  • Context window: Up to 256K tokens

Edge Models:

  • E4B and E2B: Engineered for mobile and IoT devices with native audio input and multimodal support
  • Context window: 128K tokens
  • Designed to run completely offline on Android devices, Raspberry Pi, NVIDIA Jetson Orin Nano, and other edge hardware

Capabilities

All Gemma 4 models include:

  • Advanced multi-step reasoning and planning
  • Native function-calling and structured JSON output for agentic workflows
  • High-quality code generation with offline capability
  • Native vision and audio processing (video, images, variable resolutions, OCR, chart understanding)
  • Training on 140+ languages
  • Variable resolution image processing and speech recognition (E2B/E4B)

Licensing and Distribution

Gemma 4 is released under Apache 2.0, a commercially permissive open-source license. The models are available immediately via Hugging Face, Kaggle, and Ollama. Google DeepMind claims developers have downloaded previous Gemma versions over 400 million times, with more than 100,000 community variants created.

Integration and Tools

Day-one support includes compatibility with Hugging Face Transformers, llama.cpp, Ollama, vLLM, NVIDIA NIM, LiteRT-LM, MLX, LM Studio, Unsloth, and SGLang. For Android development, models are available through Android Studio's Agent Mode and the ML Kit GenAI Prompt API. Cloud deployment options include Google Cloud's Vertex AI, Cloud Run, GKE, and TPU-accelerated serving.

Development Collaboration

Google DeepMind collaborated with Qualcomm Technologies and MediaTek on the edge models. Previous Gemma fine-tuning efforts cited include BgGPT (Bulgarian language model by INSAIT) and Cell2Sentence-Scale (Yale University cancer research application).

What This Means

Gemma 4 represents a significant efficiency milestone: achieving near-frontier reasoning performance at smaller parameter counts reduces the hardware barrier for researchers and developers building production AI systems. The Apache 2.0 licensing removes commercial restrictions that hampered earlier open models, and multimodal edge capabilities (E2B/E4B) enable on-device AI without cloud dependency. The models' Arena AI rankings suggest measurable performance gains over comparable-sized open models, though competitive positioning against Meta's Llama and other recent releases remains to be independently verified. For enterprises prioritizing data sovereignty and offline inference, Gemma 4 addresses a concrete operational requirement.

Related Articles

model release

Mistral releases Leanstral 1.5: 119B parameter open-source model for Lean 4 proof assistance

Mistral AI has released Leanstral 1.5, an open-source 119B parameter mixture-of-experts model designed specifically for Lean 4 proof assistance. The model features 128 experts with 4 active per token (6.5B activated parameters), a 256k token context window, and multimodal input capabilities.

model release

Portugal releases Amália, open-source 9B parameter AI model trained on European Portuguese

Portugal has released Amália, its first national AI model trained specifically for European Portuguese. Built on EuroLLM-9B with 9 billion parameters, the model is fully open-source with weights, datasets, and code published under an open license. The government has committed €5.5m in initial funding through 2027.

model release

Google launches Gemini 3.1 Flash Lite Image with 4-second generation time, $0.25 per 1M input tokens

Google has released Gemini 3.1 Flash Lite Image, a text-to-image model that generates 1K resolution images in approximately 4 seconds — 2.7× faster than Gemini 3.1 Flash Image. The model is priced at $0.25 per 1M input tokens and $1.50 per 1M output tokens, with a 66K context window and knowledge cutoff of January 2025.

model release

Google DeepMind releases Nano Banana 2 Lite at $0.034 per 1K image with 4-second generation, opens Gemini Omni Flash API

Google DeepMind released Nano Banana 2 Lite (gemini-3.1-flash-lite-image), its fastest image generation model with 4-second text-to-image latency priced at $0.034 per 1K-resolution image. The company also opened developer access to Gemini Omni Flash (gemini-omni-flash-preview) for video generation and editing at $0.10 per second of output.

Comments

Loading...