model release

Google releases Gemini 3.1 Flash Live, claims improved audio recognition and lower latency for voice conversations

TL;DR

Google announced Gemini 3.1 Flash Live as its updated audio and voice model for Gemini Live and Search Live. The model claims improved acoustic recognition, better background noise filtering, support for over 90 languages, and lower latency compared to 2.5 Flash Native Audio.

2 min read
0

Google announced Gemini 3.1 Flash Live today as an upgrade to its audio and voice capabilities for Gemini Live and Search Live, now available in preview via the Gemini Live API in Google AI Studio.

According to Google, 3.1 Flash Live is the company's "highest-quality audio and voice model yet," with specific improvements in acoustic processing. The model claims to be "more effective at recognizing acoustic nuances like pitch and pace" and includes enhanced background noise filtering that better "discerns relevant speech from environmental sounds like traffic or television."

Key Technical Claims

Google claims the following improvements:

  • Language support: Over 90 languages for real-time multi-modal conversations
  • Latency: Lower latency compared to 2.5 Flash Native Audio
  • Conversation length: On Android and iOS, can "follow the thread of your conversation for twice as long"
  • Tool integration: "Significantly improved the model's ability to trigger external tools and deliver information during live conversations"
  • Instruction adherence: Better compliance with complex system instructions, maintaining "operational guardrails even when conversations take unexpected turns"
  • Response quality: Faster responses with "fewer awkward pauses" and dynamic adjustment of answer length and tone

Search Live Expansion

Google is deploying Gemini 3.1 Flash Live to roll out Search Live globally across over 200 countries and all languages where AI Mode is currently available. This includes audio and video (Google Lens) capabilities for back-and-forth conversations with Google Search.

The company claims that on Gemini Live, the new model delivers faster responses and can maintain conversation context for longer periods, which Google describes as "keeping your train of thought intact during longer brainstorms."

What This Means

Google is positioning Gemini 3.1 Flash Live as a direct performance upgrade for its voice conversation products. The focus on acoustic nuance recognition and background noise filtering suggests competition with other voice-first AI interfaces. The 90+ language support and global rollout across Search Live indicate Google's strategy to make voice interaction a primary interface for search globally. However, specific benchmark data comparing 3.1 Flash Live to competing audio models (OpenAI's real-time API, for example) is not provided.

Related Articles

model release

Google releases Gemini 3.1 Flash Live, its highest-quality audio model for real-time voice AI

Google has released Gemini 3.1 Flash Live, its highest-quality audio and voice model designed for real-time dialogue. The model scores 90.8% on ComplexFuncBench Audio and 36.1% on Scale AI's Audio MultiChallenge with reasoning enabled, with improved tonal understanding and lower latency compared to previous versions.

model release

Google launches Lyria 3 Pro, extending AI music generation to 3-minute tracks

Google announced Lyria 3 Pro, an upgraded music generation model capable of creating tracks up to three minutes long—a tenfold increase from Lyria 3's 30-second maximum. The model adds structural music understanding (intros, verses, choruses, bridges) and rolls out to Gemini app paid subscribers, Google Vids, ProducerAI, and enterprise tools including Vertex AI, the Gemini API, and AI Studio.

model release

Gemini 3.1 Flash Live scores 95.9% on Big Bench Audio, Google's fastest voice model

Google has released Gemini 3.1 Flash Live, its new voice and audio AI model, scoring 95.9% on the Big Bench Audio Benchmark at high thinking levels—second only to Step-Audio R1.1 Realtime at 97.0%. Response times range from 0.96 seconds at minimal thinking to 2.98 seconds at high thinking, with pricing held at $0.35 per hour of audio input and $1.40 per hour of audio output.

model release

Google launches Lyria 3 Pro music generator, claims training data is rights-cleared

Google has released Lyria 3 Pro, its latest AI music generation model capable of creating tracks up to three minutes long with improved understanding of musical structure. The model is available through Gemini, Google Vids, Vertex AI, and Google AI Studio. Google claims the training data comes from sources it has contractual and legal rights to use.

Comments

Loading...