LLM News

Every LLM release, update, and milestone.

Filtered by:audio-tokenization✕ clear
research

Meta researchers show flattened speech tokens outperform hierarchical models in Llama-Mimi

Meta researchers propose Llama-Mimi, a speech language model that flattens multi-level RVQ tokens from neural audio codecs into single sequences processed by a standard Transformer decoder. The approach outperforms hierarchical models on most tasks while achieving best-in-class acoustic consistency performance.

research

Vevo2 unifies speech and singing voice generation with prosody and style control

Researchers introduce Vevo2, a unified framework for controllable speech and singing voice generation that addresses data scarcity and enables flexible control over prosody, style, and timbre. The system uses two specialized audio tokenizers and combines auto-regressive and flow-matching models to handle both synthesis and voice conversion tasks.

research

Vevo2 unifies speech and singing voice generation with controllable prosody and style

Researchers have introduced Vevo2, a unified framework that handles both controllable speech and singing voice generation through two specialized audio tokenizers. The approach enables fine-grained control over prosody, style, and timbre while addressing data scarcity in singing synthesis through joint speech-singing training.