Google DeepMind Releases Gemini 3.5 Live Translate for Real-Time Speech Translation Across 70+ Languages
Google DeepMind released Gemini 3.5 Live Translate, an audio model that provides near real-time speech-to-speech translation across 70+ languages. The model automatically detects languages, preserves speaker intonation and pacing, and maintains a few seconds of latency while generating continuous speech output.
Google DeepMind Releases Gemini 3.5 Live Translate for Real-Time Speech Translation Across 70+ Languages
Google DeepMind released Gemini 3.5 Live Translate on June 9, 2026, an audio model that provides near real-time speech-to-speech translation across 70+ languages with automatic language detection.
Technical Capabilities
The model generates continuous translated speech while maintaining a latency of "just a few seconds" behind the speaker, according to Google. Unlike turn-based translation systems that wait for complete sentences, Gemini 3.5 Live Translate processes streaming audio and balances translation speed with contextual accuracy.
Key technical features include:
- Automatic detection of 70+ languages without manual configuration
- Preservation of speaker intonation, pacing, and pitch in translated output
- Noise robustness for unpredictable environments
- Support for over 2,000 language pair combinations in single sessions
- SynthID watermarking embedded in all generated audio
Availability and Deployment
Gemini 3.5 Live Translate is rolling out across three channels:
Gemini Live API: Available in public preview for developers via Google AI Studio. Developer platforms including Agora, Fishjam, LiveKit, Pipecat, and Vision Agents have integrated the API for real-time media streaming infrastructure.
Google Meet: Launching in private preview this month for select Google Workspace business customers, expanding from the previous limitation of five languages and English-only translation pairs. Broader rollout planned for later in 2026.
Google Translate app: Rolling out globally on Android and iOS. The model powers the Live translate feature for users with connected headphones. Android users receive an additional "listening mode" that streams translations through the phone's earpiece without headphones.
Early Implementations
Grab, which processes over 10 million voice calls monthly, is testing the model to enable multilingual communication between drivers and travelers. Additional partners including CJ ENM and LiveKit have provided feedback on translation quality and low latency, according to Google.
Pricing for API access has not been disclosed.
What This Means
Gemini 3.5 Live Translate represents Google's entry into the competitive real-time speech translation market, directly challenging established players in multilingual communication tools. The 70+ language support and 2,000+ language pair combinations significantly exceed the capabilities of Google's previous Meet translation system, which supported only five languages with English as a required pivot.
The model's continuous streaming approach addresses a core limitation of turn-based systems, though the "few seconds" latency specification lacks precision for developers evaluating real-time requirements. The integration across Google's product ecosystem—from developer APIs to consumer apps—indicates a platform play rather than a standalone model release. However, the lack of disclosed API pricing and benchmark comparisons to competing speech translation models limits technical evaluation.
Related Articles
Google DeepMind releases Gemma 4 12B: encoder-free multimodal model runs on 16GB RAM
Google DeepMind has released Gemma 4 12B, a 12-billion parameter multimodal model that runs locally on laptops with 16GB of RAM. The model eliminates separate vision and audio encoders, processing raw inputs directly through its language model backbone under an Apache 2.0 license.
Apple ships 20-billion-parameter model that runs from iPhone flash storage using expert pruning
Apple detailed its third-generation Foundation Models family: five models including AFM 3 Core Advanced, a 20-billion-parameter on-device model that keeps most parameters in flash storage and loads only 1-4 billion at a time into memory. The models were custom-built with Google and trained on Google's TPUs.
Nex AGI Releases Nex-N2-Pro: 17B Active Parameter MoE Model with 262K Context Window
Nex AGI has released Nex-N2-Pro, a mixture-of-experts model with 17 billion active parameters from a total of 397 billion parameters. Built on the Qwen3.5 architecture, the model offers a 262,144 token context window and is available for free through OpenRouter.
Nex AGI Releases Nex-N2-Pro: 397B Parameter MoE Model With 262K Context, Available Free
Nex AGI has released Nex-N2-Pro, an agentic mixture-of-experts model with 397B total parameters and 17B active parameters. The model features a 262K token context window and is available free via OpenRouter's API.
Comments
Loading...