model release

Google releases DiffusionGemma 26B, open-weight model generates 500+ tokens/second

TL;DR

Google has released DiffusionGemma 26B, an open-weight text generation model under Apache 2 license. The model generates over 500 tokens/second according to testing on NVIDIA's free NIM API, where it produced 2,409 tokens in 4.4 seconds.

June 10, 2026 · 8:20 PM1 min read

DiffusionGemma 26B A4B IT — Quick Specs

Compare DiffusionGemma 26B A4B IT with other models →

Google releases DiffusionGemma 26B, open-weight model generates 500+ tokens/second

Google has released DiffusionGemma 26B, an open-weight text generation model licensed under Apache 2. The model is based on Google's previously experimental Gemini Diffusion architecture from May 2025, which briefly appeared in preview before being withdrawn.

Performance metrics

The model demonstrates generation speeds exceeding 500 tokens per second. In testing on NVIDIA's NIM cloud API, DiffusionGemma 26B generated 2,409 tokens in 4.4 seconds when creating an image description. This represents a significant speed improvement over standard autoregressive language models.

Google's earlier Gemini Diffusion preview in May 2025 reportedly achieved 857 tokens per second, suggesting the architecture maintains high-speed generation capabilities.

Availability and access

The model is available as google/diffusiongemma-26B-A4B-it on Hugging Face. NVIDIA is currently hosting the model free of charge on their NIM cloud API platform, providing immediate access without local deployment requirements.

The 26B parameter model uses a diffusion-based approach to text generation rather than traditional autoregressive decoding, which enables parallel token generation and faster inference speeds.

Technical details

DiffusionGemma represents a departure from standard transformer architectures that generate tokens sequentially. Instead, the diffusion approach allows multiple tokens to be refined simultaneously during generation, similar to image diffusion models adapted for discrete text.

The "A4B" designation in the model name likely indicates architecture-specific configuration details, though Google has not released full technical specifications.

What this means

DiffusionGemma 26B validates diffusion architectures as a viable alternative to autoregressive generation for language models. The 500+ tokens/second speed, combined with Apache 2 licensing, makes this the fastest openly available language model by generation speed. This could shift inference economics for applications requiring high-throughput text generation, though quality comparisons with standard models like Llama or Gemma remain to be established through independent benchmarking.

Source: simonwillison.net ↗

google gemma diffusion-models open-weights nvidia inference-speed apache-2

model releaseJuly 25, 2026

Microsoft Releases Fara1.5-27B, a 27B Vision-Only Web Browsing Agent with 262K Context

Microsoft Research AI Frontiers has released Fara1.5-27B, a 27-billion-parameter multimodal agent that completes web tasks by reading screenshots and emitting click/type/scroll commands. The model, fine-tuned from Qwen3.5-27B, ships under MIT license with a 262K-token context window and is designed to run alongside Microsoft's MagenticLite sandbox.

model releaseJuly 25, 2026

Anthropic's Claude Opus 5 Hits 0% Prompt Injection Success Rate in Browser Agent Tests, With Defenses Enabled

Anthropic's system card for Claude Opus 5 reports a 0% prompt injection success rate across 129 browser agent test scenarios when Auto Mode is enabled. On Gray Swan's broader indirect prompt injection benchmark, Opus 5 posted a 2.0% attacker success rate after 15 attempts, the lowest among tested frontier models.

model releaseJuly 25, 2026

Anthropic Ships Claude Opus 5, Claims Near-Fable Performance at Half the Price

Anthropic released Claude Opus 5 on July 24, 2026, positioning it as a lower-cost alternative to its more expensive Claude Fable 5 model. Independent evaluators Epoch AI and Artificial Analysis report mixed but largely favorable results, with Opus 5 nearly matching Fable 5 on coding benchmarks while cutting cost-per-task by roughly 20%.

model releaseJuly 24, 2026

Anthropic Ships Claude Opus 5, Claims It Matches Flagship Fable 5 on Coding at Half the Cost

Anthropic released Claude Opus 5 on July 24, its fourth model launch in under two months, priced at $5 per million input tokens and $25 per million output tokens. The company claims the model matches or beats its flagship Fable 5 on most coding and knowledge-work benchmarks while posting the lowest deception rate of any model it has shipped.

Google releases DiffusionGemma 26B, open-weight model generates 500+ tokens/second

DiffusionGemma 26B A4B IT — Quick Specs

Google releases DiffusionGemma 26B, open-weight model generates 500+ tokens/second

Performance metrics

Availability and access

Technical details

What this means

Related Articles

Microsoft Releases Fara1.5-27B, a 27B Vision-Only Web Browsing Agent with 262K Context

Anthropic's Claude Opus 5 Hits 0% Prompt Injection Success Rate in Browser Agent Tests, With Defenses Enabled

Anthropic Ships Claude Opus 5, Claims Near-Fable Performance at Half the Price

Anthropic Ships Claude Opus 5, Claims It Matches Flagship Fable 5 on Coding at Half the Cost

Comments