Google releases Gemini 3.1 Flash Live, claims improved audio recognition and lower latency for voice conversations
Google announced Gemini 3.1 Flash Live as its updated audio and voice model for Gemini Live and Search Live. The model claims improved acoustic recognition, better background noise filtering, support for over 90 languages, and lower latency compared to 2.5 Flash Native Audio.
Google announced Gemini 3.1 Flash Live today as an upgrade to its audio and voice capabilities for Gemini Live and Search Live, now available in preview via the Gemini Live API in Google AI Studio.
According to Google, 3.1 Flash Live is the company's "highest-quality audio and voice model yet," with specific improvements in acoustic processing. The model claims to be "more effective at recognizing acoustic nuances like pitch and pace" and includes enhanced background noise filtering that better "discerns relevant speech from environmental sounds like traffic or television."
Key Technical Claims
Google claims the following improvements:
- Language support: Over 90 languages for real-time multi-modal conversations
- Latency: Lower latency compared to 2.5 Flash Native Audio
- Conversation length: On Android and iOS, can "follow the thread of your conversation for twice as long"
- Tool integration: "Significantly improved the model's ability to trigger external tools and deliver information during live conversations"
- Instruction adherence: Better compliance with complex system instructions, maintaining "operational guardrails even when conversations take unexpected turns"
- Response quality: Faster responses with "fewer awkward pauses" and dynamic adjustment of answer length and tone
Search Live Expansion
Google is deploying Gemini 3.1 Flash Live to roll out Search Live globally across over 200 countries and all languages where AI Mode is currently available. This includes audio and video (Google Lens) capabilities for back-and-forth conversations with Google Search.
The company claims that on Gemini Live, the new model delivers faster responses and can maintain conversation context for longer periods, which Google describes as "keeping your train of thought intact during longer brainstorms."
What This Means
Google is positioning Gemini 3.1 Flash Live as a direct performance upgrade for its voice conversation products. The focus on acoustic nuance recognition and background noise filtering suggests competition with other voice-first AI interfaces. The 90+ language support and global rollout across Search Live indicate Google's strategy to make voice interaction a primary interface for search globally. However, specific benchmark data comparing 3.1 Flash Live to competing audio models (OpenAI's real-time API, for example) is not provided.
Related Articles
Google releases Gemini 3.1 Flash Image, claims Pro-level quality at $0.50 per 1M tokens
Google has released Gemini 3.1 Flash Image, internally codenamed "Nano Banana 2," an image generation and editing model with a 131K context window. The model is priced at $0.50 per 1M input tokens and $3 per 1M output tokens.
Mistral Releases Voxtral TTS: 4B Parameter Text-to-Speech Model at $0.016 per 1k Characters
Mistral AI has released Voxtral TTS, a 4B parameter text-to-speech model supporting 9 languages including English, French, German, Spanish, Dutch, Portuguese, Italian, Hindi, and Arabic. The model achieves 70ms latency for typical inputs and can clone voices from as little as 3 seconds of audio, priced at $0.016 per 1,000 characters.
Mistral Releases Mistral 3 Family: 675B-Parameter Large 3 MoE and Three Edge Models Under Apache 2.0
Mistral has released Mistral 3, including Mistral Large 3—a sparse mixture-of-experts model with 41B active and 675B total parameters—and three Ministral 3 edge models (3B, 8B, 14B). All models are released under Apache 2.0 license with multimodal capabilities and are available today on multiple platforms.
Google releases Nano Banana Pro image generation model with 2K/4K output and five-subject identity preservation
Google has released Nano Banana Pro, an advanced image generation and editing model built on Gemini 3 Pro. The model supports 2K/4K output resolution, preserves identity across up to five subjects, and includes real-time Search grounding for context-rich visual synthesis.
Comments
Loading...