model release

Google releases Gemma 4 26B with 256K context and multimodal support, free to use

TL;DR

Google DeepMind has released Gemma 4 26B A4B, a free instruction-tuned Mixture-of-Experts model with 262,144 token context window and multimodal capabilities including text, images, and video input. Despite 25.2B total parameters, only 3.8B activate per token, delivering performance comparable to larger 31B models at reduced compute cost.

April 7, 2026 · 7:50 PM2 min read

Gemma 4 26B A4B IT — Quick Specs

Context window262K tokens

Compare Gemma 4 26B A4B IT with other models →

Google Releases Gemma 4 26B with 256K Context and Multimodal Support, Free

Google DeepMind has released Gemma 4 26B A4B, a free Mixture-of-Experts model available immediately. The model features a 262,144 token context window, native support for text, images, and video input (up to 60 seconds at 1fps), and is released under Apache 2.0 license.

Model Architecture and Performance

Gemma 4 26B employs a sparse Mixture-of-Experts architecture with 25.2B total parameters but only 3.8B active parameters per token during inference. Google claims this configuration delivers performance comparable to larger 31B dense models while requiring substantially less compute. The model is instruction-tuned and includes native function calling, structured output support, and configurable thinking/reasoning mode for step-by-step problem solving.

Multimodal Capabilities

Unlike earlier Gemma variants, Gemma 4 26B supports multimodal input across text, images, and video. Video support handles sequences up to 60 seconds sampled at 1 frame per second, enabling analysis of temporal content without requiring separate video understanding components.

Pricing and Availability

The model is available for free with zero cost per million input tokens and zero cost per million output tokens. It is accessible via OpenRouter, which routes requests across multiple providers and manages fallback routing to maximize uptime. Model weights are available for local deployment under Apache 2.0 license.

What This Means

Gemma 4 26B represents a significant shift in Google's open model strategy—pairing genuine multimodal capabilities with a sparse architecture that reduces inference costs. The 256K context window matches or exceeds most competitive models, and free pricing removes adoption barriers. For developers, this addresses a clear gap: capable open models with video understanding have been limited. The sparse MoE design is particularly relevant for cost-sensitive deployments where inference happens at scale. The reasoning mode addition suggests Google is matching OpenAI's o1-style thinking patterns in its open offerings.

Source: openrouter.ai ↗

gemma google-deepmind moe mixture-of-experts multimodal free-model open-source 256k-context

model releaseJuly 6, 2026

Nex AGI releases Nex-N2-Mini: open-source agentic MoE model with 262K context window

Nex AGI has released Nex-N2-Mini, an open-source agentic mixture-of-experts model with a 262K-token context window. The model accepts text and image inputs and is priced at $0.025 per 1M input tokens and $0.10 per 1M output tokens.

model releaseJuly 6, 2026

Tencent Releases Hy3: 295B MoE Model with 256K Context and Configurable Reasoning Modes

Tencent has released Hy3, a 295-billion parameter Mixture-of-Experts model with 21 billion active parameters and a 256,000-token context window. The model features configurable reasoning modes and is available free through OpenRouter, with deployment ending July 21, 2026.

model releaseJuly 4, 2026

Mistral releases Leanstral 1.5: 119B parameter open-source model for Lean 4 proof assistance

Mistral AI has released Leanstral 1.5, an open-source 119B parameter mixture-of-experts model designed specifically for Lean 4 proof assistance. The model features 128 experts with 4 active per token (6.5B activated parameters), a 256k token context window, and multimodal input capabilities.

model releaseJune 29, 2026

DeepReinforce Releases Ornith-1.0, Open-Source Agentic Coding Model in 9B to 397B Sizes

DeepReinforce has released Ornith-1.0, an MIT-licensed model designed for agentic coding tasks with variants ranging from 9B to 397B parameters. Built on top of Apache 2.0-licensed Gemma 4 and Qwen 3.5 base models, the company claims it achieves state-of-the-art performance among open-source models of comparable size on coding benchmarks.