model release

Google releases Gemini 3.1 Flash Live, its highest-quality audio model for real-time voice AI

TL;DR

Google has released Gemini 3.1 Flash Live, its highest-quality audio and voice model designed for real-time dialogue. The model scores 90.8% on ComplexFuncBench Audio and 36.1% on Scale AI's Audio MultiChallenge with reasoning enabled, with improved tonal understanding and lower latency compared to previous versions.

March 26, 2026 · 3:35 PM2 min read

Gemini 3.1 Flash Live — Quick Specs

Compare Gemini 3.1 Flash Live with other models →

Google Releases Gemini 3.1 Flash Live, Its Highest-Quality Audio Model

Google has launched Gemini 3.1 Flash Live, a real-time audio and voice model designed to deliver more natural and reliable voice interactions. The model is now available to developers via the Gemini Live API in Google AI Studio, to enterprises through Gemini Enterprise for Customer Experience, and to all users via Gemini Live and Search Live.

Performance Benchmarks

On ComplexFuncBench Audio—which measures multi-step function calling with various constraints—Gemini 3.1 Flash Live achieves 90.8%, outperforming the previous model. On Scale AI's Audio MultiChallenge, which tests complex instruction following and real-world audio conditions including interruptions and hesitations, the model scores 36.1% with "thinking" mode enabled.

Google claims the model delivers improved latency compared to its predecessor, enabling faster response times for voice-first applications. The company also reports enhanced tonal understanding, allowing the model to recognize acoustic nuances like pitch and pace, and to dynamically adjust responses based on user expressions of frustration or confusion.

Developer Features

For developers, Gemini 3.1 Flash Live enables building voice agents capable of executing complex, multi-step tasks in noisy environments. The model supports function calling with improved reliability at scale. In Gemini Live, users can maintain conversation context for twice as long as with the previous model, preserving continuity during extended brainstorming sessions.

Companies including Verizon, LiveKit, and The Home Depot have provided positive feedback on the model's performance in production workflows, highlighting natural conversation quality.

Multilingual and Global Rollout

Gemini 3.1 Flash Live is inherently multilingual, enabling this week's global expansion of Search Live to over 200 countries and territories. Users can now conduct real-time, multimodal conversations with Google Search in their preferred language.

Safety and Watermarking

All audio generated by Gemini 3.1 Flash Live is watermarked using Google's SynthID technology. According to Google, this imperceptible watermark is embedded directly into audio output, enabling reliable detection of AI-generated content to help prevent misinformation.

What This Means

Gemini 3.1 Flash Live represents a meaningful advancement in real-time voice AI, with concrete benchmark improvements in function calling and instruction following. The model's expansion to 200+ countries positions Google to compete more aggressively in voice-first AI interfaces. The SynthID watermarking approach addresses growing regulatory and safety concerns around synthetic audio detection. For enterprises and developers, the improved tonal understanding and lower latency reduce friction in deploying voice agents for customer service and complex task automation.

Source: blog.google ↗

gemini audio-ai voice-ai google-deepmind model-release real-time-dialogue multimodal

model releaseMay 7, 2026

Google releases Gemini 3.1 Flash Lite with 1M context at $0.25 per million input tokens

Google has released Gemini 3.1 Flash Lite, a high-efficiency multimodal model with a 1,048,576 token context window priced at $0.25 per million input tokens and $1.50 per million output tokens. The model supports text, image, video, audio, and PDF inputs with four thinking levels for cost-performance optimization.

model releaseMay 6, 2026

Google DeepMind Releases Gemma 4 26B A4B Assistant Model for 2x Faster Inference via Multi-Token Prediction

Google DeepMind has released a Multi-Token Prediction assistant model for Gemma 4 26B A4B that achieves up to 2x decoding speedup through speculative decoding. The model uses 3.8B active parameters from a 25.2B total parameter MoE architecture with 128 experts and a 256K token context window.

model releaseMay 6, 2026

Google DeepMind releases Gemma 4 with 31B dense model, 256K context window, and speculative decoding drafters

Google DeepMind has released Gemma 4, a family of open-weight multimodal models including a 31B dense model with 256K context window and four size variants ranging from 2.3B to 30.7B effective parameters. The release includes Multi-Token Prediction (MTP) draft models that achieve up to 2x decoding speedup through speculative decoding while maintaining identical output quality.

model releaseMay 10, 2026

Google DeepMind Releases Gemma 4 E4B with Multi-Token Prediction for 2x Faster Inference

Google DeepMind released the Gemma 4 E4B assistant model using Multi-Token Prediction (MTP) architecture that accelerates inference by up to 2x through speculative decoding. The 4.5B effective parameter model supports 128K context windows and handles text, image, and audio input with pricing not yet disclosed.