LLM News | TPS

research

Vevo2 unifies speech and singing voice generation with prosody and style control

Researchers introduce Vevo2, a unified framework for controllable speech and singing voice generation that addresses data scarcity and enables flexible control over prosody, style, and timbre. The system uses two specialized audio tokenizers and combines auto-regressive and flow-matching models to handle both synthesis and voice conversion tasks.

March 6, 2026 · 5:23 AM2 min read

voice-synthesis speech-generation singing-synthesis

via arxiv.org ↗