LLM News | TPS

research

Vevo2 unifies speech and singing voice generation with controllable prosody and style

Researchers have introduced Vevo2, a unified framework that handles both controllable speech and singing voice generation through two specialized audio tokenizers. The approach enables fine-grained control over prosody, style, and timbre while addressing data scarcity in singing synthesis through joint speech-singing training.

March 6, 2026 · 5:09 AM2 min read

voice-synthesis speech-generation singing-synthesis

via arxiv.org ↗