model release

Google releases Gemini 3.1 Flash Live, claims improved audio recognition and lower latency for voice conversations

TL;DR

Google announced Gemini 3.1 Flash Live as its updated audio and voice model for Gemini Live and Search Live. The model claims improved acoustic recognition, better background noise filtering, support for over 90 languages, and lower latency compared to 2.5 Flash Native Audio.

2 min read
0

Google announced Gemini 3.1 Flash Live today as an upgrade to its audio and voice capabilities for Gemini Live and Search Live, now available in preview via the Gemini Live API in Google AI Studio.

According to Google, 3.1 Flash Live is the company's "highest-quality audio and voice model yet," with specific improvements in acoustic processing. The model claims to be "more effective at recognizing acoustic nuances like pitch and pace" and includes enhanced background noise filtering that better "discerns relevant speech from environmental sounds like traffic or television."

Key Technical Claims

Google claims the following improvements:

  • Language support: Over 90 languages for real-time multi-modal conversations
  • Latency: Lower latency compared to 2.5 Flash Native Audio
  • Conversation length: On Android and iOS, can "follow the thread of your conversation for twice as long"
  • Tool integration: "Significantly improved the model's ability to trigger external tools and deliver information during live conversations"
  • Instruction adherence: Better compliance with complex system instructions, maintaining "operational guardrails even when conversations take unexpected turns"
  • Response quality: Faster responses with "fewer awkward pauses" and dynamic adjustment of answer length and tone

Search Live Expansion

Google is deploying Gemini 3.1 Flash Live to roll out Search Live globally across over 200 countries and all languages where AI Mode is currently available. This includes audio and video (Google Lens) capabilities for back-and-forth conversations with Google Search.

The company claims that on Gemini Live, the new model delivers faster responses and can maintain conversation context for longer periods, which Google describes as "keeping your train of thought intact during longer brainstorms."

What This Means

Google is positioning Gemini 3.1 Flash Live as a direct performance upgrade for its voice conversation products. The focus on acoustic nuance recognition and background noise filtering suggests competition with other voice-first AI interfaces. The 90+ language support and global rollout across Search Live indicate Google's strategy to make voice interaction a primary interface for search globally. However, specific benchmark data comparing 3.1 Flash Live to competing audio models (OpenAI's real-time API, for example) is not provided.

Related Articles

model release

Google releases Gemini 3.1 Flash Lite with 1M context at $0.25 per million input tokens

Google has released Gemini 3.1 Flash Lite, a high-efficiency multimodal model with a 1,048,576 token context window priced at $0.25 per million input tokens and $1.50 per million output tokens. The model supports text, image, video, audio, and PDF inputs with four thinking levels for cost-performance optimization.

model release

Tencent Releases Hy3 Preview: Mixture-of-Experts Model with 262K Context and Configurable Reasoning

Tencent has released Hy3 preview, a Mixture-of-Experts model with a 262,144 token context window priced at $0.066 per million input tokens and $0.26 per million output tokens. The model features three configurable reasoning modes—disabled, low, and high—designed for agentic workflows and production environments.

model release

Supertone releases Supertonic 3: 99M-parameter on-device TTS model supporting 31 languages

Supertone has released Supertonic 3, a 99M-parameter text-to-speech model that runs entirely on-device using ONNX Runtime. The model expands language support from 5 to 31 languages compared to Supertonic 2, requires no GPU, and claims competitive accuracy against models 7-20x larger.

model release

Google DeepMind Releases Gemma 4 E4B with Multi-Token Prediction for 2x Faster Inference

Google DeepMind released the Gemma 4 E4B assistant model using Multi-Token Prediction (MTP) architecture that accelerates inference by up to 2x through speculative decoding. The 4.5B effective parameter model supports 128K context windows and handles text, image, and audio input with pricing not yet disclosed.

Comments

Loading...