Voxtral TTS

Mistral AI🇫🇷 France
active
Output / 1M tokens$16000

Version History

1.0major

First Mistral text-to-speech model. Supports voice cloning from minimal audio, available as both API and open-weights version.

Coverage

model release

Mistral releases Voxtral, open-weight TTS model that clones voices from 3 seconds of audio

Mistral has released Voxtral TTS, a 4-billion-parameter text-to-speech model that can clone voices from just three seconds of reference audio across nine languages. The model delivers 70ms latency for typical 10-second samples and outperformed ElevenLabs Flash v2.5 in naturalness tests. Voxtral is available via API at $0.016 per 1,000 characters and as open-weights on Hugging Face.

2 min read