Gemini 3.1 Flash Live scores 95.9% on Big Bench Audio, Google's fastest voice model
Google has released Gemini 3.1 Flash Live, its new voice and audio AI model, scoring 95.9% on the Big Bench Audio Benchmark at high thinking levels—second only to Step-Audio R1.1 Realtime at 97.0%. Response times range from 0.96 seconds at minimal thinking to 2.98 seconds at high thinking, with pricing held at $0.35 per hour of audio input and $1.40 per hour of audio output.
Google Releases Gemini 3.1 Flash Live Voice Model
Google has unveiled Gemini 3.1 Flash Live, a new voice and audio AI model positioned as the company's best-performing audio offering to date. The model is now available through the Gemini Live API, Google AI Studio, Gemini Live, and Search Live across over 200 countries.
Performance and Capabilities
According to Artificial Analysis benchmarking, Gemini 3.1 Flash Live achieves 95.9% on the Big Bench Audio Benchmark when configured to its "High" thinking level, placing it second only to Step-Audio R1.1 Realtime, which scores 97.0%. At the "Minimal" thinking level, the model's score drops to 70.5% but response time improves significantly.
Response latency varies by configuration:
- High thinking: 2.98-second response time
- Minimal thinking: 0.96-second response time
Google claims the model delivers improved pitch and emotion detection compared to its predecessor, with enhanced reliability in noisy environments. Developers can now configure thinking levels directly, allowing trade-offs between output quality and latency.
Pricing
Gemini 3.1 Flash Live maintains identical pricing to its Gemini 2.5 predecessor:
- Audio input: $0.35 per hour
- Audio output: $1.40 per hour
Google positions this as among the cheapest audio AI models available. For comparison, Step-Audio R1.1 Realtime offers lower input pricing but charges more for audio output.
Deployment and Integration
The model now powers live mode functionality within the Gemini app, enabling real-time voice conversations. Integration is available through multiple access points, supporting developers building voice applications across the Google ecosystem.
What this means
Gemini 3.1 Flash Live competes directly with Step-Audio R1.1 Realtime in the high-performance voice AI space, with nearly matching benchmark scores at a lower price point. The configurable thinking levels provide developers genuine flexibility for latency-sensitive applications—a meaningful improvement over fixed-performance models. At 0.96 seconds for minimal thinking, the model targets real-time conversational use cases where sub-second response times matter. The widespread availability across 200+ countries and multiple access methods signals Google's commitment to voice as a core interaction paradigm for Gemini products.
Related Articles
Google releases Gemini 3.1 Flash Live, its highest-quality audio model for real-time voice AI
Google has released Gemini 3.1 Flash Live, its highest-quality audio and voice model designed for real-time dialogue. The model scores 90.8% on ComplexFuncBench Audio and 36.1% on Scale AI's Audio MultiChallenge with reasoning enabled, with improved tonal understanding and lower latency compared to previous versions.
Google releases Gemini 3.1 Flash Live, claims improved audio recognition and lower latency for voice conversations
Google announced Gemini 3.1 Flash Live as its updated audio and voice model for Gemini Live and Search Live. The model claims improved acoustic recognition, better background noise filtering, support for over 90 languages, and lower latency compared to 2.5 Flash Native Audio.
Mistral releases Voxtral-4B-TTS-2603, open-weights text-to-speech model for production voice agents
Mistral AI released Voxtral-4B-TTS-2603, an open-weights text-to-speech model designed for production voice agents. The 4B-parameter model supports 9 languages, 20 preset voices, achieves 70ms latency at concurrency 1 on a single NVIDIA H200, and requires only 16GB GPU memory.
Google releases Gemini 3.1 Flash Live, its highest-quality audio model for real-time voice AI
Google has released Gemini 3.1 Flash Live, its highest-quality audio model designed for natural and reliable real-time voice interactions. The model scores 90.8% on ComplexFuncBench Audio and 36.1% on Scale AI's Audio MultiChallenge with thinking enabled. It's now available to developers via the Gemini Live API, enterprises through Gemini Enterprise for Customer Experience, and consumers in Search Live and Gemini Live across 200+ countries.
Comments
Loading...