model release

Hume AI open-sources TADA: speech model 5x faster than rivals with zero hallucination

TL;DR

Hume AI has open-sourced TADA, a speech generation model that maps exactly one audio signal to each text token, achieving 5x faster processing than comparable systems. The model produced zero transcription hallucinations across 1,000+ test samples and runs on smartphones, available in 1B and 3B parameter versions under MIT license.

1 min read
0

Hume AI has open-sourced TADA, an AI system for speech generation that synchronizes text and audio processing without the overhead of previous approaches.

Key Technical Specifications

The core innovation: TADA maps exactly one audio signal to each text token. This contrasts with previous systems that generate multiple audio frames per text token, introducing latency and complexity.

The model comes in two sizes:

  • 1B parameters: English only
  • 3B parameters: English plus seven additional languages (specific languages not detailed in announcement)

Both versions are based on Llama and released under MIT license with code and models available on GitHub and Hugging Face.

Performance Claims

According to Hume AI:

  • Speed: Over 5x faster than comparable systems
  • Hallucination rate: Zero transcription hallucinations across 1,000+ test samples (no made-up or skipped words compared to source text)
  • Naturalness: 3.78 out of 5 in human evaluations
  • Device compatibility: Compact enough to run on smartphones

Known Limitations

Hume AI notes that longer texts can cause the voice to occasionally drift, indicating potential stability issues with extended audio generation.

Availability

All code, models, and technical details are publicly available. The full technical paper has been published alongside the release.

What this means

TADA addresses two critical issues in speech synthesis: latency and hallucination. The one-to-one token-to-audio mapping is architecturally simpler than existing approaches, explaining both the speed advantage and the zero hallucination rate. The smartphone compatibility removes a practical barrier for deployment. However, the voice drift issue on longer texts suggests the model works best for short-form speech generation. The MIT license and open-source release position this as infrastructure for downstream applications rather than a consumer product, and its Llama foundation means it inherits that ecosystem's community tools and fine-tuning approaches.

Related Articles

model release

Nvidia releases Nemotron 3 Super: 120B MoE model with 1M token context

Nvidia has released Nemotron 3 Super, a 120-billion parameter hybrid Mamba-Transformer Mixture-of-Experts model that activates only 12 billion parameters during inference. The open-weight model features a 1-million token context window, multi-token prediction capabilities, and pricing at $0.10 per million input tokens and $0.50 per million output tokens.

model release

Stability AI releases Stable Audio 2.5 for enterprise sound production

Stability AI released Stable Audio 2.5, positioned as the first audio generation model built specifically for enterprise sound production. The model introduces improvements in quality and control for creating dynamic compositions adaptable to custom brand needs.

model release

Stable Video 4D 2.0 generates 4D assets from single videos with improved quality

Stability AI has released Stable Video 4D 2.0 (SV4D 2.0), an upgraded version of its multi-view video diffusion model designed to generate 4D assets from single object-centric videos. The update claims to deliver higher-quality outputs on real-world video footage.

model release

Stability AI releases Stable Audio Open Small for on-device audio generation with Arm

Stability AI has open-sourced Stable Audio Open Small in partnership with Arm, a smaller and faster variant of its text-to-audio model designed for on-device deployment. The model maintains output quality and prompt adherence while reducing computational requirements for real-world edge deployment on devices powered by Arm's technology, which runs on 99% of smartphones globally.

Comments

Loading...