model release

Hume AI open-sources TADA: speech model 5x faster than rivals with zero hallucination

TL;DR

Hume AI has open-sourced TADA, a speech generation model that maps exactly one audio signal to each text token, achieving 5x faster processing than comparable systems. The model produced zero transcription hallucinations across 1,000+ test samples and runs on smartphones, available in 1B and 3B parameter versions under MIT license.

1 min read
0

Hume AI has open-sourced TADA, an AI system for speech generation that synchronizes text and audio processing without the overhead of previous approaches.

Key Technical Specifications

The core innovation: TADA maps exactly one audio signal to each text token. This contrasts with previous systems that generate multiple audio frames per text token, introducing latency and complexity.

The model comes in two sizes:

  • 1B parameters: English only
  • 3B parameters: English plus seven additional languages (specific languages not detailed in announcement)

Both versions are based on Llama and released under MIT license with code and models available on GitHub and Hugging Face.

Performance Claims

According to Hume AI:

  • Speed: Over 5x faster than comparable systems
  • Hallucination rate: Zero transcription hallucinations across 1,000+ test samples (no made-up or skipped words compared to source text)
  • Naturalness: 3.78 out of 5 in human evaluations
  • Device compatibility: Compact enough to run on smartphones

Known Limitations

Hume AI notes that longer texts can cause the voice to occasionally drift, indicating potential stability issues with extended audio generation.

Availability

All code, models, and technical details are publicly available. The full technical paper has been published alongside the release.

What this means

TADA addresses two critical issues in speech synthesis: latency and hallucination. The one-to-one token-to-audio mapping is architecturally simpler than existing approaches, explaining both the speed advantage and the zero hallucination rate. The smartphone compatibility removes a practical barrier for deployment. However, the voice drift issue on longer texts suggests the model works best for short-form speech generation. The MIT license and open-source release position this as infrastructure for downstream applications rather than a consumer product, and its Llama foundation means it inherits that ecosystem's community tools and fine-tuning approaches.

Related Articles

model release

Tencent Releases Hy3 Preview: Mixture-of-Experts Model with 262K Context and Configurable Reasoning

Tencent has released Hy3 preview, a Mixture-of-Experts model with a 262,144 token context window priced at $0.066 per million input tokens and $0.26 per million output tokens. The model features three configurable reasoning modes—disabled, low, and high—designed for agentic workflows and production environments.

model release

Google releases Gemini 3.1 Flash Lite with 1M context at $0.25 per million input tokens

Google has released Gemini 3.1 Flash Lite, a high-efficiency multimodal model with a 1,048,576 token context window priced at $0.25 per million input tokens and $1.50 per million output tokens. The model supports text, image, video, audio, and PDF inputs with four thinking levels for cost-performance optimization.

model release

IBM Releases Granite Embedding 311M R2 With 32K Context, 200+ Language Support

IBM released Granite Embedding 311M Multilingual R2, a 311-million parameter dense embedding model with 32,768-token context length and support for 200+ languages. The model scores 64.0 on Multilingual MTEB Retrieval (18 tasks), an 11.8-point improvement over its predecessor, and ships with ONNX and OpenVINO models for production deployment.

model release

IBM releases Apache 2.0 Granite 4.1 LLMs in 3B, 8B, and 30B sizes

IBM has released the Granite 4.1 family of language models under Apache 2.0 license. The models come in 3B, 8B, and 30B parameter sizes. Unsloth has released 21 GGUF quantized variants of the 3B model ranging from 1.2GB to 6.34GB.

Comments

Loading...