model release

Gemini 3.1 Flash Live scores 95.9% on Big Bench Audio, Google's fastest voice model

TL;DR

Google has released Gemini 3.1 Flash Live, its new voice and audio AI model, scoring 95.9% on the Big Bench Audio Benchmark at high thinking levels—second only to Step-Audio R1.1 Realtime at 97.0%. Response times range from 0.96 seconds at minimal thinking to 2.98 seconds at high thinking, with pricing held at $0.35 per hour of audio input and $1.40 per hour of audio output.

March 26, 2026 · 5:50 PM2 min read

Gemini 3.1 Flash Live — Quick Specs

Compare Gemini 3.1 Flash Live with other models →

Google Releases Gemini 3.1 Flash Live Voice Model

Google has unveiled Gemini 3.1 Flash Live, a new voice and audio AI model positioned as the company's best-performing audio offering to date. The model is now available through the Gemini Live API, Google AI Studio, Gemini Live, and Search Live across over 200 countries.

Performance and Capabilities

According to Artificial Analysis benchmarking, Gemini 3.1 Flash Live achieves 95.9% on the Big Bench Audio Benchmark when configured to its "High" thinking level, placing it second only to Step-Audio R1.1 Realtime, which scores 97.0%. At the "Minimal" thinking level, the model's score drops to 70.5% but response time improves significantly.

Response latency varies by configuration:

High thinking: 2.98-second response time
Minimal thinking: 0.96-second response time

Google claims the model delivers improved pitch and emotion detection compared to its predecessor, with enhanced reliability in noisy environments. Developers can now configure thinking levels directly, allowing trade-offs between output quality and latency.

Pricing

Gemini 3.1 Flash Live maintains identical pricing to its Gemini 2.5 predecessor:

Audio input: $0.35 per hour
Audio output: $1.40 per hour

Google positions this as among the cheapest audio AI models available. For comparison, Step-Audio R1.1 Realtime offers lower input pricing but charges more for audio output.

Deployment and Integration

The model now powers live mode functionality within the Gemini app, enabling real-time voice conversations. Integration is available through multiple access points, supporting developers building voice applications across the Google ecosystem.

What this means

Gemini 3.1 Flash Live competes directly with Step-Audio R1.1 Realtime in the high-performance voice AI space, with nearly matching benchmark scores at a lower price point. The configurable thinking levels provide developers genuine flexibility for latency-sensitive applications—a meaningful improvement over fixed-performance models. At 0.96 seconds for minimal thinking, the model targets real-time conversational use cases where sub-second response times matter. The widespread availability across 200+ countries and multiple access methods signals Google's commitment to voice as a core interaction paradigm for Gemini products.

Source: the-decoder.com ↗

google-deepmind voice-ai audio-models gemini-3-1 benchmarks big-bench-audio real-time-ai model-release

model releaseMay 8, 2026

Tencent Releases Hy3 Preview: Mixture-of-Experts Model with 262K Context and Configurable Reasoning

Tencent has released Hy3 preview, a Mixture-of-Experts model with a 262,144 token context window priced at $0.066 per million input tokens and $0.26 per million output tokens. The model features three configurable reasoning modes—disabled, low, and high—designed for agentic workflows and production environments.

model releaseMay 7, 2026

Google releases Gemini 3.1 Flash Lite with 1M context at $0.25 per million input tokens

Google has released Gemini 3.1 Flash Lite, a high-efficiency multimodal model with a 1,048,576 token context window priced at $0.25 per million input tokens and $1.50 per million output tokens. The model supports text, image, video, audio, and PDF inputs with four thinking levels for cost-performance optimization.

model releaseMay 6, 2026

Google DeepMind Releases Gemma 4 26B A4B Assistant Model for 2x Faster Inference via Multi-Token Prediction

Google DeepMind has released a Multi-Token Prediction assistant model for Gemma 4 26B A4B that achieves up to 2x decoding speedup through speculative decoding. The model uses 3.8B active parameters from a 25.2B total parameter MoE architecture with 128 experts and a 256K token context window.