model release

Hume AI open-sources TADA: speech model 5x faster than rivals with zero hallucination

TL;DR

Hume AI has open-sourced TADA, a speech generation model that maps exactly one audio signal to each text token, achieving 5x faster processing than comparable systems. The model produced zero transcription hallucinations across 1,000+ test samples and runs on smartphones, available in 1B and 3B parameter versions under MIT license.

1 min read
0

Hume AI has open-sourced TADA, an AI system for speech generation that synchronizes text and audio processing without the overhead of previous approaches.

Key Technical Specifications

The core innovation: TADA maps exactly one audio signal to each text token. This contrasts with previous systems that generate multiple audio frames per text token, introducing latency and complexity.

The model comes in two sizes:

  • 1B parameters: English only
  • 3B parameters: English plus seven additional languages (specific languages not detailed in announcement)

Both versions are based on Llama and released under MIT license with code and models available on GitHub and Hugging Face.

Performance Claims

According to Hume AI:

  • Speed: Over 5x faster than comparable systems
  • Hallucination rate: Zero transcription hallucinations across 1,000+ test samples (no made-up or skipped words compared to source text)
  • Naturalness: 3.78 out of 5 in human evaluations
  • Device compatibility: Compact enough to run on smartphones

Known Limitations

Hume AI notes that longer texts can cause the voice to occasionally drift, indicating potential stability issues with extended audio generation.

Availability

All code, models, and technical details are publicly available. The full technical paper has been published alongside the release.

What this means

TADA addresses two critical issues in speech synthesis: latency and hallucination. The one-to-one token-to-audio mapping is architecturally simpler than existing approaches, explaining both the speed advantage and the zero hallucination rate. The smartphone compatibility removes a practical barrier for deployment. However, the voice drift issue on longer texts suggests the model works best for short-form speech generation. The MIT license and open-source release position this as infrastructure for downstream applications rather than a consumer product, and its Llama foundation means it inherits that ecosystem's community tools and fine-tuning approaches.

Related Articles

model release

Mistral Releases Mistral 3 Family: 675B-Parameter Large 3 MoE and Three Edge Models Under Apache 2.0

Mistral has released Mistral 3, including Mistral Large 3—a sparse mixture-of-experts model with 41B active and 675B total parameters—and three Ministral 3 edge models (3B, 8B, 14B). All models are released under Apache 2.0 license with multimodal capabilities and are available today on multiple platforms.

model release

Baidu Releases Unlimited-OCR, a 3B Parameter Document Parsing Model Based on Deepseek-OCR

Baidu has released Unlimited-OCR, a 3 billion parameter model for optical character recognition and document parsing. The model supports single-page and multi-page document processing with a 32,768 token context window and runs on NVIDIA GPUs using bfloat16 precision.

model release

Poolside releases Laguna M.1: 225B parameter MoE model scores 74.6% on SWE-bench Verified

Poolside has released Laguna M.1, a 225B total parameter Mixture-of-Experts model with 23B activated parameters per token, designed for agentic coding tasks. The model scores 74.6% on SWE-bench Verified and 63.1% on SWE-bench Multilingual, released under Apache 2.0 license.

model release

Mistral Releases Voxtral TTS: 4B Parameter Text-to-Speech Model at $0.016 per 1k Characters

Mistral AI has released Voxtral TTS, a 4B parameter text-to-speech model supporting 9 languages including English, French, German, Spanish, Dutch, Portuguese, Italian, Hindi, and Arabic. The model achieves 70ms latency for typical inputs and can clone voices from as little as 3 seconds of audio, priced at $0.016 per 1,000 characters.

Comments

Loading...