model release

Google releases Gemini 3.1 Flash Live, its highest-quality audio model for real-time voice AI

TL;DR

Google has released Gemini 3.1 Flash Live, its highest-quality audio model designed for natural and reliable real-time voice interactions. The model scores 90.8% on ComplexFuncBench Audio and 36.1% on Scale AI's Audio MultiChallenge with thinking enabled. It's now available to developers via the Gemini Live API, enterprises through Gemini Enterprise for Customer Experience, and consumers in Search Live and Gemini Live across 200+ countries.

March 26, 2026 · 3:36 PM2 min read

Gemini 3.1 Flash Live — Quick Specs

Compare Gemini 3.1 Flash Live with other models →

Google Releases Gemini 3.1 Flash Live, Its Highest-Quality Audio Model

Google has launched Gemini 3.1 Flash Live, a new audio and voice model designed to deliver faster, more natural real-time dialogue capabilities. The model is now available across multiple platforms including developer APIs, enterprise customer experience tools, and consumer products.

Performance and Capabilities

Gemini 3.1 Flash Live demonstrates significant improvements in reasoning and task execution. On ComplexFuncBench Audio—a benchmark measuring multi-step function calling with various constraints—the model achieves 90.8%, leading competing offerings. On Scale AI's Audio MultiChallenge, which tests complex instruction following and long-horizon reasoning amid real-world interruptions and hesitations, the model scores 36.1% with thinking enabled.

The model shows improved tonal understanding compared to its predecessor, Gemini 2.5 Flash Native Audio. It better recognizes acoustic nuances like pitch and pace and dynamically adjusts responses to users' expressions of frustration or confusion.

Developer Access and Enterprise Use

Developers can access Gemini 3.1 Flash Live in preview through the Gemini Live API in Google AI Studio. The model enables builders to construct voice-first agents capable of handling complex tasks in noisy environments. Companies including Verizon, LiveKit, and The Home Depot have provided positive feedback during testing, highlighting the model's improved natural conversation quality.

Enterprises can deploy the model through Gemini Enterprise for Customer Experience, where it delivers enhanced acoustic nuance recognition and better frustration-detection capabilities.

Consumer Features

Gemini Live and Search Live now leverage Gemini 3.1 Flash Live to deliver faster responses and extended conversation context—the model can maintain conversation threads for twice as long as the previous version.

With this launch, Search Live expands to over 200 countries and territories with multilingual support. Gemini 3.1 Flash Live is inherently multilingual, enabling real-time multimodal conversations in users' preferred languages.

Safety and Watermarking

All audio generated by Gemini 3.1 Flash Live is watermarked using Google's SynthID technology. The imperceptible watermark is embedded directly into audio output to enable reliable detection of AI-generated content and help prevent misinformation spread.

What This Means

Google is positioning Gemini 3.1 Flash Live as a foundational model for voice-first AI applications. The benchmark gains—particularly the 36.1% score on Scale AI's challenging multimodal benchmark with thinking enabled—suggest meaningful progress in real-world audio reasoning. The 200+ country expansion of Search Live indicates Google is betting heavily on voice as a primary interface for search and AI assistance. For developers, availability in Google AI Studio lowers barriers to building voice agents, though enterprise pricing and specific latency metrics remain undisclosed.

Source: deepmind.google ↗

gemini google-deepmind audio-ai voice-model real-time-dialogue multimodal enterprise-ai api-release

model releaseMay 7, 2026

Google releases Gemini 3.1 Flash Lite with 1M context at $0.25 per million input tokens

Google has released Gemini 3.1 Flash Lite, a high-efficiency multimodal model with a 1,048,576 token context window priced at $0.25 per million input tokens and $1.50 per million output tokens. The model supports text, image, video, audio, and PDF inputs with four thinking levels for cost-performance optimization.

model releaseMay 6, 2026

Google DeepMind Releases Gemma 4 26B A4B Assistant Model for 2x Faster Inference via Multi-Token Prediction

Google DeepMind has released a Multi-Token Prediction assistant model for Gemma 4 26B A4B that achieves up to 2x decoding speedup through speculative decoding. The model uses 3.8B active parameters from a 25.2B total parameter MoE architecture with 128 experts and a 256K token context window.

model releaseMay 6, 2026

Google DeepMind releases Gemma 4 with 31B dense model, 256K context window, and speculative decoding drafters

Google DeepMind has released Gemma 4, a family of open-weight multimodal models including a 31B dense model with 256K context window and four size variants ranging from 2.3B to 30.7B effective parameters. The release includes Multi-Token Prediction (MTP) draft models that achieve up to 2x decoding speedup through speculative decoding while maintaining identical output quality.

model releaseMay 10, 2026

Google DeepMind Releases Gemma 4 E4B with Multi-Token Prediction for 2x Faster Inference

Google DeepMind released the Gemma 4 E4B assistant model using Multi-Token Prediction (MTP) architecture that accelerates inference by up to 2x through speculative decoding. The 4.5B effective parameter model supports 128K context windows and handles text, image, and audio input with pricing not yet disclosed.