model release

Google releases Gemma 4 26B with 256K context and multimodal support, free to use

TL;DR

Google DeepMind has released Gemma 4 26B A4B, a free instruction-tuned Mixture-of-Experts model with 262,144 token context window and multimodal capabilities including text, images, and video input. Despite 25.2B total parameters, only 3.8B activate per token, delivering performance comparable to larger 31B models at reduced compute cost.

2 min read
0

Google Releases Gemma 4 26B with 256K Context and Multimodal Support, Free

Google DeepMind has released Gemma 4 26B A4B, a free Mixture-of-Experts model available immediately. The model features a 262,144 token context window, native support for text, images, and video input (up to 60 seconds at 1fps), and is released under Apache 2.0 license.

Model Architecture and Performance

Gemma 4 26B employs a sparse Mixture-of-Experts architecture with 25.2B total parameters but only 3.8B active parameters per token during inference. Google claims this configuration delivers performance comparable to larger 31B dense models while requiring substantially less compute. The model is instruction-tuned and includes native function calling, structured output support, and configurable thinking/reasoning mode for step-by-step problem solving.

Multimodal Capabilities

Unlike earlier Gemma variants, Gemma 4 26B supports multimodal input across text, images, and video. Video support handles sequences up to 60 seconds sampled at 1 frame per second, enabling analysis of temporal content without requiring separate video understanding components.

Pricing and Availability

The model is available for free with zero cost per million input tokens and zero cost per million output tokens. It is accessible via OpenRouter, which routes requests across multiple providers and manages fallback routing to maximize uptime. Model weights are available for local deployment under Apache 2.0 license.

What This Means

Gemma 4 26B represents a significant shift in Google's open model strategy—pairing genuine multimodal capabilities with a sparse architecture that reduces inference costs. The 256K context window matches or exceeds most competitive models, and free pricing removes adoption barriers. For developers, this addresses a clear gap: capable open models with video understanding have been limited. The sparse MoE design is particularly relevant for cost-sensitive deployments where inference happens at scale. The reasoning mode addition suggests Google is matching OpenAI's o1-style thinking patterns in its open offerings.

Related Articles

model release

Cohere Releases Command A+ Open Source Model with 25B Active Parameters, 128K Context

Cohere has released Command A+ as an open source model under Apache 2.0 license. The sparse mixture-of-experts architecture features 25 billion active parameters out of 218B total parameters, supports 128K input context length, and includes vision capabilities alongside tool use and reasoning features.

model release

Cohere Releases Command A+: 218B-Parameter MoE Model With 4-Bit Quantization Runs on Single B200 GPU

Cohere has released Command A+, an open-source sparse mixture-of-experts model with 218 billion total parameters and 25 billion active parameters. The model features W4A4 quantization allowing deployment on a single Nvidia B200 GPU, supports 128K input context, and includes built-in chain-of-thought reasoning with vision capabilities.

model release

Tencent Releases Hy-MT2: 1.8B Translation Model Compressed to 440MB With 1.25-Bit Quantization

Tencent has open-sourced Hy-MT2, a family of multilingual translation models available in 1.8B, 7B, and 30B-A3B parameter sizes. The models support translation across 33 languages and include extreme quantization down to 1.25-bit, reducing the 1.8B model to 440MB storage while increasing inference speed by 1.5x.

model release

Google releases Gemini Omni Flash video generation model with conversational editing, withholds speech synthesis

Google DeepMind released Gemini Omni Flash, the first model in its new Omni family that generates and edits video from image, audio, video, and text inputs. The model is rolling out to Gemini app subscribers and YouTube Shorts with a 10-second clip limit, while speech-editing capabilities remain withheld pending safety testing.

Comments

Loading...