Google releases Gemini 3.1 Flash Live, its highest-quality audio model for real-time voice AI
Google has released Gemini 3.1 Flash Live, its highest-quality audio model designed for natural and reliable real-time voice interactions. The model scores 90.8% on ComplexFuncBench Audio and 36.1% on Scale AI's Audio MultiChallenge with thinking enabled. It's now available to developers via the Gemini Live API, enterprises through Gemini Enterprise for Customer Experience, and consumers in Search Live and Gemini Live across 200+ countries.
Google Releases Gemini 3.1 Flash Live, Its Highest-Quality Audio Model
Google has launched Gemini 3.1 Flash Live, a new audio and voice model designed to deliver faster, more natural real-time dialogue capabilities. The model is now available across multiple platforms including developer APIs, enterprise customer experience tools, and consumer products.
Performance and Capabilities
Gemini 3.1 Flash Live demonstrates significant improvements in reasoning and task execution. On ComplexFuncBench Audio—a benchmark measuring multi-step function calling with various constraints—the model achieves 90.8%, leading competing offerings. On Scale AI's Audio MultiChallenge, which tests complex instruction following and long-horizon reasoning amid real-world interruptions and hesitations, the model scores 36.1% with thinking enabled.
The model shows improved tonal understanding compared to its predecessor, Gemini 2.5 Flash Native Audio. It better recognizes acoustic nuances like pitch and pace and dynamically adjusts responses to users' expressions of frustration or confusion.
Developer Access and Enterprise Use
Developers can access Gemini 3.1 Flash Live in preview through the Gemini Live API in Google AI Studio. The model enables builders to construct voice-first agents capable of handling complex tasks in noisy environments. Companies including Verizon, LiveKit, and The Home Depot have provided positive feedback during testing, highlighting the model's improved natural conversation quality.
Enterprises can deploy the model through Gemini Enterprise for Customer Experience, where it delivers enhanced acoustic nuance recognition and better frustration-detection capabilities.
Consumer Features
Gemini Live and Search Live now leverage Gemini 3.1 Flash Live to deliver faster responses and extended conversation context—the model can maintain conversation threads for twice as long as the previous version.
With this launch, Search Live expands to over 200 countries and territories with multilingual support. Gemini 3.1 Flash Live is inherently multilingual, enabling real-time multimodal conversations in users' preferred languages.
Safety and Watermarking
All audio generated by Gemini 3.1 Flash Live is watermarked using Google's SynthID technology. The imperceptible watermark is embedded directly into audio output to enable reliable detection of AI-generated content and help prevent misinformation spread.
What This Means
Google is positioning Gemini 3.1 Flash Live as a foundational model for voice-first AI applications. The benchmark gains—particularly the 36.1% score on Scale AI's challenging multimodal benchmark with thinking enabled—suggest meaningful progress in real-world audio reasoning. The 200+ country expansion of Search Live indicates Google is betting heavily on voice as a primary interface for search and AI assistance. For developers, availability in Google AI Studio lowers barriers to building voice agents, though enterprise pricing and specific latency metrics remain undisclosed.
Related Articles
Google releases Gemini 3.1 Flash Live, its highest-quality audio model for real-time voice AI
Google has released Gemini 3.1 Flash Live, its highest-quality audio and voice model designed for real-time dialogue. The model scores 90.8% on ComplexFuncBench Audio and 36.1% on Scale AI's Audio MultiChallenge with reasoning enabled, with improved tonal understanding and lower latency compared to previous versions.
Mistral releases Voxtral TTS, open-source speech model for enterprise voice agents
Mistral AI released Voxtral TTS, an open-source text-to-speech model designed for enterprise voice agents and edge devices. The model supports nine languages, adapts custom voices from samples under five seconds, and achieves 90ms time-to-first-audio latency with a 6x real-time factor.
Gemini 3.1 Flash Live scores 95.9% on Big Bench Audio, Google's fastest voice model
Google has released Gemini 3.1 Flash Live, its new voice and audio AI model, scoring 95.9% on the Big Bench Audio Benchmark at high thinking levels—second only to Step-Audio R1.1 Realtime at 97.0%. Response times range from 0.96 seconds at minimal thinking to 2.98 seconds at high thinking, with pricing held at $0.35 per hour of audio input and $1.40 per hour of audio output.
Google releases Gemini 3.1 Flash Live, claims improved audio recognition and lower latency for voice conversations
Google announced Gemini 3.1 Flash Live as its updated audio and voice model for Gemini Live and Search Live. The model claims improved acoustic recognition, better background noise filtering, support for over 90 languages, and lower latency compared to 2.5 Flash Native Audio.
Comments
Loading...