Google releases Gemma 4 26B with 256K context and multimodal support, free to use
Google DeepMind has released Gemma 4 26B A4B, a free instruction-tuned Mixture-of-Experts model with 262,144 token context window and multimodal capabilities including text, images, and video input. Despite 25.2B total parameters, only 3.8B activate per token, delivering performance comparable to larger 31B models at reduced compute cost.
Gemma 4 26B A4B IT — Quick Specs
Google Releases Gemma 4 26B with 256K Context and Multimodal Support, Free
Google DeepMind has released Gemma 4 26B A4B, a free Mixture-of-Experts model available immediately. The model features a 262,144 token context window, native support for text, images, and video input (up to 60 seconds at 1fps), and is released under Apache 2.0 license.
Model Architecture and Performance
Gemma 4 26B employs a sparse Mixture-of-Experts architecture with 25.2B total parameters but only 3.8B active parameters per token during inference. Google claims this configuration delivers performance comparable to larger 31B dense models while requiring substantially less compute. The model is instruction-tuned and includes native function calling, structured output support, and configurable thinking/reasoning mode for step-by-step problem solving.
Multimodal Capabilities
Unlike earlier Gemma variants, Gemma 4 26B supports multimodal input across text, images, and video. Video support handles sequences up to 60 seconds sampled at 1 frame per second, enabling analysis of temporal content without requiring separate video understanding components.
Pricing and Availability
The model is available for free with zero cost per million input tokens and zero cost per million output tokens. It is accessible via OpenRouter, which routes requests across multiple providers and manages fallback routing to maximize uptime. Model weights are available for local deployment under Apache 2.0 license.
What This Means
Gemma 4 26B represents a significant shift in Google's open model strategy—pairing genuine multimodal capabilities with a sparse architecture that reduces inference costs. The 256K context window matches or exceeds most competitive models, and free pricing removes adoption barriers. For developers, this addresses a clear gap: capable open models with video understanding have been limited. The sparse MoE design is particularly relevant for cost-sensitive deployments where inference happens at scale. The reasoning mode addition suggests Google is matching OpenAI's o1-style thinking patterns in its open offerings.
Related Articles
Cohere Releases Command A+ Open Source Model with 25B Active Parameters, 128K Context
Cohere has released Command A+ as an open source model under Apache 2.0 license. The sparse mixture-of-experts architecture features 25 billion active parameters out of 218B total parameters, supports 128K input context length, and includes vision capabilities alongside tool use and reasoning features.
Cohere Releases Command A+: 218B-Parameter MoE Model With 4-Bit Quantization Runs on Single B200 GPU
Cohere has released Command A+, an open-source sparse mixture-of-experts model with 218 billion total parameters and 25 billion active parameters. The model features W4A4 quantization allowing deployment on a single Nvidia B200 GPU, supports 128K input context, and includes built-in chain-of-thought reasoning with vision capabilities.
Tencent Releases Hy-MT2: 1.8B Translation Model Compressed to 440MB With 1.25-Bit Quantization
Tencent has open-sourced Hy-MT2, a family of multilingual translation models available in 1.8B, 7B, and 30B-A3B parameter sizes. The models support translation across 33 languages and include extreme quantization down to 1.25-bit, reducing the 1.8B model to 440MB storage while increasing inference speed by 1.5x.
Google releases Gemini Omni Flash video generation model with conversational editing, withholds speech synthesis
Google DeepMind released Gemini Omni Flash, the first model in its new Omni family that generates and edits video from image, audio, video, and text inputs. The model is rolling out to Gemini app subscribers and YouTube Shorts with a 10-second clip limit, while speech-editing capabilities remain withheld pending safety testing.
Comments
Loading...