Google releases Gemini 3.1 Flash Live, its highest-quality audio model for real-time voice AI
Google has released Gemini 3.1 Flash Live, its highest-quality audio model designed for natural and reliable real-time voice interactions. The model scores 90.8% on ComplexFuncBench Audio and 36.1% on Scale AI's Audio MultiChallenge with thinking enabled. It's now available to developers via the Gemini Live API, enterprises through Gemini Enterprise for Customer Experience, and consumers in Search Live and Gemini Live across 200+ countries.
Google Releases Gemini 3.1 Flash Live, Its Highest-Quality Audio Model
Google has launched Gemini 3.1 Flash Live, a new audio and voice model designed to deliver faster, more natural real-time dialogue capabilities. The model is now available across multiple platforms including developer APIs, enterprise customer experience tools, and consumer products.
Performance and Capabilities
Gemini 3.1 Flash Live demonstrates significant improvements in reasoning and task execution. On ComplexFuncBench Audio—a benchmark measuring multi-step function calling with various constraints—the model achieves 90.8%, leading competing offerings. On Scale AI's Audio MultiChallenge, which tests complex instruction following and long-horizon reasoning amid real-world interruptions and hesitations, the model scores 36.1% with thinking enabled.
The model shows improved tonal understanding compared to its predecessor, Gemini 2.5 Flash Native Audio. It better recognizes acoustic nuances like pitch and pace and dynamically adjusts responses to users' expressions of frustration or confusion.
Developer Access and Enterprise Use
Developers can access Gemini 3.1 Flash Live in preview through the Gemini Live API in Google AI Studio. The model enables builders to construct voice-first agents capable of handling complex tasks in noisy environments. Companies including Verizon, LiveKit, and The Home Depot have provided positive feedback during testing, highlighting the model's improved natural conversation quality.
Enterprises can deploy the model through Gemini Enterprise for Customer Experience, where it delivers enhanced acoustic nuance recognition and better frustration-detection capabilities.
Consumer Features
Gemini Live and Search Live now leverage Gemini 3.1 Flash Live to deliver faster responses and extended conversation context—the model can maintain conversation threads for twice as long as the previous version.
With this launch, Search Live expands to over 200 countries and territories with multilingual support. Gemini 3.1 Flash Live is inherently multilingual, enabling real-time multimodal conversations in users' preferred languages.
Safety and Watermarking
All audio generated by Gemini 3.1 Flash Live is watermarked using Google's SynthID technology. The imperceptible watermark is embedded directly into audio output to enable reliable detection of AI-generated content and help prevent misinformation spread.
What This Means
Google is positioning Gemini 3.1 Flash Live as a foundational model for voice-first AI applications. The benchmark gains—particularly the 36.1% score on Scale AI's challenging multimodal benchmark with thinking enabled—suggest meaningful progress in real-world audio reasoning. The 200+ country expansion of Search Live indicates Google is betting heavily on voice as a primary interface for search and AI assistance. For developers, availability in Google AI Studio lowers barriers to building voice agents, though enterprise pricing and specific latency metrics remain undisclosed.
Related Articles
Google releases Gemini 3.1 Flash Image, claims Pro-level quality at $0.50 per 1M tokens
Google has released Gemini 3.1 Flash Image, internally codenamed "Nano Banana 2," an image generation and editing model with a 131K context window. The model is priced at $0.50 per 1M input tokens and $3 per 1M output tokens.
Mistral OCR 4 Launches With Bounding Boxes, 170 Language Support at $2-4 Per 1,000 Pages
Mistral AI released OCR 4, a compact document extraction model that returns bounding boxes, block classification, and inline confidence scores alongside text. The model supports 170 languages, scores 85.20 on OlmOCRBench, and is priced at $4 per 1,000 pages via API ($2 with batch discount) or $5 per 1,000 pages through Document AI.
Sakana AI releases Fugu orchestration model to route tasks across multiple AI vendors
Sakana AI released Fugu, an orchestration language model that routes tasks across multiple AI providers to reduce vendor lock-in risks. The Japanese AI firm positions Fugu as a solution to enterprise dependency on single monolithic AI APIs.
Mistral OCR 3 launches at $2 per 1,000 pages with 74% win rate over previous version
Mistral AI released Mistral OCR 3, a document extraction model priced at $2 per 1,000 pages ($1 with Batch API discount). The model achieves a 74% overall win rate over its predecessor on forms, scanned documents, complex tables, and handwriting according to internal benchmarks.
Comments
Loading...