research
Vevo2 unifies speech and singing voice generation with prosody and style control
Researchers introduce Vevo2, a unified framework for controllable speech and singing voice generation that addresses data scarcity and enables flexible control over prosody, style, and timbre. The system uses two specialized audio tokenizers and combines auto-regressive and flow-matching models to handle both synthesis and voice conversion tasks.