audio-generation
4 articles tagged with audio-generation
Google DeepMind releases Gemini 3.1 Flash TTS with audio tags for precise speech control across 70+ languages
Google DeepMind launched Gemini 3.1 Flash TTS, a text-to-speech model that achieved an Elo score of 1,211 on the Artificial Analysis TTS leaderboard. The model introduces audio tags that allow developers to control vocal style, pace, and delivery through natural language commands embedded in text input, with support for 70+ languages.
Stability AI releases Stable Audio 2.5 for enterprise sound production
Stability AI released Stable Audio 2.5, positioned as the first audio generation model built specifically for enterprise sound production. The model introduces improvements in quality and control for creating dynamic compositions adaptable to custom brand needs.
Stability AI releases Stable Audio Open Small for on-device audio generation with Arm
Stability AI has open-sourced Stable Audio Open Small in partnership with Arm, a smaller and faster variant of its text-to-audio model designed for on-device deployment. The model maintains output quality and prompt adherence while reducing computational requirements for real-world edge deployment on devices powered by Arm's technology, which runs on 99% of smartphones globally.
Google integrates Lyria 3 music generation into Gemini with text-to-music and cover art
Google Deepmind has integrated its Lyria 3 model into Gemini, enabling users to generate 30-second music tracks with vocals, lyrics, and cover art from text prompts or uploaded media. The model represents an expansion of Google's multimodal AI capabilities into creative audio generation.