Google DeepMind

6 articles tagged with Google DeepMind

June 17, 2026

NVIDIA Releases Quantized DiffusionGemma 26B: 1,100+ Tokens/Second with 256K Context Window

NVIDIA released a quantized version of Google DeepMind's DiffusionGemma 26B A4B IT, a multimodal model with 25.2B total parameters (3.8B active) that processes text, image, and video inputs. The NVFP4-quantized model achieves generation speeds exceeding 1,100 tokens per second on NVIDIA H100 GPUs while supporting a 256K token context window.

June 17, 2026 · 12:06 PM

June 15, 2026

model releaseGoogle DeepMind

Amazon Bedrock adds Gemma 4 models with 256K context and built-in reasoning mode

Amazon Web Services today announced availability of Google DeepMind's Gemma 4 family on Amazon Bedrock. The open-weight models include three instruction-tuned variants spanning 2.3B to 30.7B parameters, with 256K context windows, multimodal input support, and built-in reasoning mode.

June 15, 2026 · 8:35 PM

June 10, 2026

model releaseGoogle DeepMind

Google DeepMind releases DiffusionGemma, a 26B parameter model generating 15-20 tokens per forward pass via discrete dif

Google DeepMind released DiffusionGemma, a 26B parameter mixture-of-experts model that generates text using discrete diffusion instead of autoregression. The model processes blocks of 256 tokens in parallel, achieving generation speeds exceeding 1100 tokens per second on H100 GPUs in low-batch settings.

June 10, 2026 · 6:06 PM

June 9, 2026

model release

Google DeepMind Releases Gemini 3.5 Live Translate for Real-Time Speech Translation Across 70+ Languages

Google DeepMind released Gemini 3.5 Live Translate, an audio model that provides near real-time speech-to-speech translation across 70+ languages. The model automatically detects languages, preserves speaker intonation and pacing, and maintains a few seconds of latency while generating continuous speech output.

June 9, 2026 · 3:35 PM

changelogGoogle DeepMind

Google DeepMind Releases Quantization-Aware Training Versions of Gemma 4 Models in GGUF Format

Google DeepMind has released quantization-aware training (QAT) optimized versions of its Gemma 4 model family in GGUF Q4_0 format. The QAT versions preserve similar quality to bfloat16 while dramatically reducing memory requirements, with models available across the entire Gemma 4 lineup: E2B, E4B, 12B, 26B A4B, and 31B.

June 9, 2026 · 12:36 AM

May 10, 2026

model releaseGoogle DeepMind

Google DeepMind Releases Gemma 4 E4B with Multi-Token Prediction for 2x Faster Inference

Google DeepMind released the Gemma 4 E4B assistant model using Multi-Token Prediction (MTP) architecture that accelerates inference by up to 2x through speculative decoding. The 4.5B effective parameter model supports 128K context windows and handles text, image, and audio input with pricing not yet disclosed.

May 10, 2026 · 5:06 AM

← Back to all news