audio-ai

5 articles tagged with audio-ai

April 7, 2026
product updateAmazon Web Services

Amazon Nova 2 Sonic enables real-time AI podcast generation with 1M token context

Amazon has published a technical guide for building real-time conversational podcasts using Amazon Nova 2 Sonic, its speech understanding and generation model. The solution demonstrates streaming audio generation, multi-turn dialogue between AI hosts, and stage-aware content filtering through a web interface.

March 30, 2026
model release

Google releases Lyria 3 Clip Preview for music generation via API

Google has released Lyria 3 Clip Preview, a music generation model available through the Gemini API as of March 30, 2026. The model generates 30-second audio clips from text prompts or images at $0.04 per clip, with a 1,048,576 token context window.

March 26, 2026
model release

Google releases Gemini 3.1 Flash Live, its highest-quality audio model for real-time voice AI

Google has released Gemini 3.1 Flash Live, its highest-quality audio model designed for natural and reliable real-time voice interactions. The model scores 90.8% on ComplexFuncBench Audio and 36.1% on Scale AI's Audio MultiChallenge with thinking enabled. It's now available to developers via the Gemini Live API, enterprises through Gemini Enterprise for Customer Experience, and consumers in Search Live and Gemini Live across 200+ countries.

model release

Google releases Gemini 3.1 Flash Live, its highest-quality audio model for real-time voice AI

Google has released Gemini 3.1 Flash Live, its highest-quality audio and voice model designed for real-time dialogue. The model scores 90.8% on ComplexFuncBench Audio and 36.1% on Scale AI's Audio MultiChallenge with reasoning enabled, with improved tonal understanding and lower latency compared to previous versions.

March 1, 2026
benchmark

ElevenLabs and Google lead Artificial Analysis speech-to-text benchmark

Artificial Analysis has released an updated speech-to-text benchmark showing ElevenLabs and Google as top performers. The benchmark provides comparative analysis of current speech recognition systems across multiple models.