model release

Gemini 3.1 Flash Live scores 95.9% on Big Bench Audio, Google's fastest voice model

TL;DR

Google has released Gemini 3.1 Flash Live, its new voice and audio AI model, scoring 95.9% on the Big Bench Audio Benchmark at high thinking levels—second only to Step-Audio R1.1 Realtime at 97.0%. Response times range from 0.96 seconds at minimal thinking to 2.98 seconds at high thinking, with pricing held at $0.35 per hour of audio input and $1.40 per hour of audio output.

2 min read
0

Google Releases Gemini 3.1 Flash Live Voice Model

Google has unveiled Gemini 3.1 Flash Live, a new voice and audio AI model positioned as the company's best-performing audio offering to date. The model is now available through the Gemini Live API, Google AI Studio, Gemini Live, and Search Live across over 200 countries.

Performance and Capabilities

According to Artificial Analysis benchmarking, Gemini 3.1 Flash Live achieves 95.9% on the Big Bench Audio Benchmark when configured to its "High" thinking level, placing it second only to Step-Audio R1.1 Realtime, which scores 97.0%. At the "Minimal" thinking level, the model's score drops to 70.5% but response time improves significantly.

Response latency varies by configuration:

  • High thinking: 2.98-second response time
  • Minimal thinking: 0.96-second response time

Google claims the model delivers improved pitch and emotion detection compared to its predecessor, with enhanced reliability in noisy environments. Developers can now configure thinking levels directly, allowing trade-offs between output quality and latency.

Pricing

Gemini 3.1 Flash Live maintains identical pricing to its Gemini 2.5 predecessor:

  • Audio input: $0.35 per hour
  • Audio output: $1.40 per hour

Google positions this as among the cheapest audio AI models available. For comparison, Step-Audio R1.1 Realtime offers lower input pricing but charges more for audio output.

Deployment and Integration

The model now powers live mode functionality within the Gemini app, enabling real-time voice conversations. Integration is available through multiple access points, supporting developers building voice applications across the Google ecosystem.

What this means

Gemini 3.1 Flash Live competes directly with Step-Audio R1.1 Realtime in the high-performance voice AI space, with nearly matching benchmark scores at a lower price point. The configurable thinking levels provide developers genuine flexibility for latency-sensitive applications—a meaningful improvement over fixed-performance models. At 0.96 seconds for minimal thinking, the model targets real-time conversational use cases where sub-second response times matter. The widespread availability across 200+ countries and multiple access methods signals Google's commitment to voice as a core interaction paradigm for Gemini products.

Related Articles

model release

Z.ai's GLM-5.2 Matches Claude Opus 4.8 in Agent Tasks, First Open Model to Compete in Coding

Z.ai released GLM-5.2 on June 16, 2026, the first open-weight model to match proprietary models like Claude Opus 4.8 on agent benchmarks. The MIT-licensed model closes the performance gap to 6.8 months behind frontier labs, down from expected 9+ months as compute scales.

model release

Mistral Releases Voxtral TTS: 4B Parameter Text-to-Speech Model at $0.016 per 1k Characters

Mistral AI has released Voxtral TTS, a 4B parameter text-to-speech model supporting 9 languages including English, French, German, Spanish, Dutch, Portuguese, Italian, Hindi, and Arabic. The model achieves 70ms latency for typical inputs and can clone voices from as little as 3 seconds of audio, priced at $0.016 per 1,000 characters.

model release

Mistral Releases Mistral 3 Family: 675B-Parameter Large 3 MoE and Three Edge Models Under Apache 2.0

Mistral has released Mistral 3, including Mistral Large 3—a sparse mixture-of-experts model with 41B active and 675B total parameters—and three Ministral 3 edge models (3B, 8B, 14B). All models are released under Apache 2.0 license with multimodal capabilities and are available today on multiple platforms.

model release

Mistral Releases Codestral Embed, Code-Specialized Embedding Model at $0.15 Per Million Tokens

Mistral AI has released Codestral Embed, its first code-specialized embedding model, priced at $0.15 per million tokens. The model features an 8192-token context window and claims to outperform Voyage Code 3, Cohere Embed v4.0, and OpenAI's large embedding model on code retrieval benchmarks.

Comments

Loading...