Google DeepMind
6 articles tagged with Google DeepMind
NVIDIA Releases Quantized DiffusionGemma 26B: 1,100+ Tokens/Second with 256K Context Window
NVIDIA released a quantized version of Google DeepMind's DiffusionGemma 26B A4B IT, a multimodal model with 25.2B total parameters (3.8B active) that processes text, image, and video inputs. The NVFP4-quantized model achieves generation speeds exceeding 1,100 tokens per second on NVIDIA H100 GPUs while supporting a 256K token context window.
Amazon Bedrock adds Gemma 4 models with 256K context and built-in reasoning mode
Amazon Web Services today announced availability of Google DeepMind's Gemma 4 family on Amazon Bedrock. The open-weight models include three instruction-tuned variants spanning 2.3B to 30.7B parameters, with 256K context windows, multimodal input support, and built-in reasoning mode.
Google DeepMind releases DiffusionGemma, a 26B parameter model generating 15-20 tokens per forward pass via discrete dif
Google DeepMind released DiffusionGemma, a 26B parameter mixture-of-experts model that generates text using discrete diffusion instead of autoregression. The model processes blocks of 256 tokens in parallel, achieving generation speeds exceeding 1100 tokens per second on H100 GPUs in low-batch settings.
Google DeepMind Releases Gemini 3.5 Live Translate for Real-Time Speech Translation Across 70+ Languages
Google DeepMind released Gemini 3.5 Live Translate, an audio model that provides near real-time speech-to-speech translation across 70+ languages. The model automatically detects languages, preserves speaker intonation and pacing, and maintains a few seconds of latency while generating continuous speech output.
Google DeepMind Releases Quantization-Aware Training Versions of Gemma 4 Models in GGUF Format
Google DeepMind has released quantization-aware training (QAT) optimized versions of its Gemma 4 model family in GGUF Q4_0 format. The QAT versions preserve similar quality to bfloat16 while dramatically reducing memory requirements, with models available across the entire Gemma 4 lineup: E2B, E4B, 12B, 26B A4B, and 31B.
Google DeepMind Releases Gemma 4 E4B with Multi-Token Prediction for 2x Faster Inference
Google DeepMind released the Gemma 4 E4B assistant model using Multi-Token Prediction (MTP) architecture that accelerates inference by up to 2x through speculative decoding. The 4.5B effective parameter model supports 128K context windows and handles text, image, and audio input with pricing not yet disclosed.