research
Vevo2 unifies speech and singing voice generation with controllable prosody and style
Researchers have introduced Vevo2, a unified framework that handles both controllable speech and singing voice generation through two specialized audio tokenizers. The approach enables fine-grained control over prosody, style, and timbre while addressing data scarcity in singing synthesis through joint speech-singing training.