Gemini 3.1 Flash Live scores 95.9% on Big Bench Audio, Google's fastest voice model
Google has released Gemini 3.1 Flash Live, its new voice and audio AI model, scoring 95.9% on the Big Bench Audio Benchmark at high thinking levels—second only to Step-Audio R1.1 Realtime at 97.0%. Response times range from 0.96 seconds at minimal thinking to 2.98 seconds at high thinking, with pricing held at $0.35 per hour of audio input and $1.40 per hour of audio output.
Google Releases Gemini 3.1 Flash Live Voice Model
Google has unveiled Gemini 3.1 Flash Live, a new voice and audio AI model positioned as the company's best-performing audio offering to date. The model is now available through the Gemini Live API, Google AI Studio, Gemini Live, and Search Live across over 200 countries.
Performance and Capabilities
According to Artificial Analysis benchmarking, Gemini 3.1 Flash Live achieves 95.9% on the Big Bench Audio Benchmark when configured to its "High" thinking level, placing it second only to Step-Audio R1.1 Realtime, which scores 97.0%. At the "Minimal" thinking level, the model's score drops to 70.5% but response time improves significantly.
Response latency varies by configuration:
- High thinking: 2.98-second response time
- Minimal thinking: 0.96-second response time
Google claims the model delivers improved pitch and emotion detection compared to its predecessor, with enhanced reliability in noisy environments. Developers can now configure thinking levels directly, allowing trade-offs between output quality and latency.
Pricing
Gemini 3.1 Flash Live maintains identical pricing to its Gemini 2.5 predecessor:
- Audio input: $0.35 per hour
- Audio output: $1.40 per hour
Google positions this as among the cheapest audio AI models available. For comparison, Step-Audio R1.1 Realtime offers lower input pricing but charges more for audio output.
Deployment and Integration
The model now powers live mode functionality within the Gemini app, enabling real-time voice conversations. Integration is available through multiple access points, supporting developers building voice applications across the Google ecosystem.
What this means
Gemini 3.1 Flash Live competes directly with Step-Audio R1.1 Realtime in the high-performance voice AI space, with nearly matching benchmark scores at a lower price point. The configurable thinking levels provide developers genuine flexibility for latency-sensitive applications—a meaningful improvement over fixed-performance models. At 0.96 seconds for minimal thinking, the model targets real-time conversational use cases where sub-second response times matter. The widespread availability across 200+ countries and multiple access methods signals Google's commitment to voice as a core interaction paradigm for Gemini products.
Related Articles
Tencent Releases Hy3 Preview: Mixture-of-Experts Model with 262K Context and Configurable Reasoning
Tencent has released Hy3 preview, a Mixture-of-Experts model with a 262,144 token context window priced at $0.066 per million input tokens and $0.26 per million output tokens. The model features three configurable reasoning modes—disabled, low, and high—designed for agentic workflows and production environments.
Google releases Gemini 3.1 Flash Lite with 1M context at $0.25 per million input tokens
Google has released Gemini 3.1 Flash Lite, a high-efficiency multimodal model with a 1,048,576 token context window priced at $0.25 per million input tokens and $1.50 per million output tokens. The model supports text, image, video, audio, and PDF inputs with four thinking levels for cost-performance optimization.
Google DeepMind Releases Gemma 4 26B A4B Assistant Model for 2x Faster Inference via Multi-Token Prediction
Google DeepMind has released a Multi-Token Prediction assistant model for Gemma 4 26B A4B that achieves up to 2x decoding speedup through speculative decoding. The model uses 3.8B active parameters from a 25.2B total parameter MoE architecture with 128 experts and a 256K token context window.
Google DeepMind releases Gemma 4 with 31B dense model, 256K context window, and speculative decoding drafters
Google DeepMind has released Gemma 4, a family of open-weight multimodal models including a 31B dense model with 256K context window and four size variants ranging from 2.3B to 30.7B effective parameters. The release includes Multi-Token Prediction (MTP) draft models that achieve up to 2x decoding speedup through speculative decoding while maintaining identical output quality.
Comments
Loading...