text-to-speech

11 articles tagged with text-to-speech

June 18, 2026

Mistral Releases Voxtral TTS: 4B Parameter Text-to-Speech Model at $0.016 per 1k Characters

Mistral AI has released Voxtral TTS, a 4B parameter text-to-speech model supporting 9 languages including English, French, German, Spanish, Dutch, Portuguese, Italian, Hindi, and Arabic. The model achieves 70ms latency for typical inputs and can clone voices from as little as 3 seconds of audio, priced at $0.016 per 1,000 characters.

June 18, 2026 · 9:07 AM

May 10, 2026

model release

Supertone releases Supertonic 3: 99M-parameter on-device TTS model supporting 31 languages

Supertone has released Supertonic 3, a 99M-parameter text-to-speech model that runs entirely on-device using ONNX Runtime. The model expands language support from 5 to 31 languages compared to Supertonic 2, requires no GPU, and claims competitive accuracy against models 7-20x larger.

May 10, 2026 · 11:05 AM

April 15, 2026

model release

Google releases Gemini 3.1 Flash TTS with prompt-directed voice control

Google released Gemini 3.1 Flash TTS, a text-to-speech model that accepts detailed prompts to control voice characteristics, speaking style, accent, and delivery. The model is available through the standard Gemini API using the model ID 'gemini-3.1-flash-tts-preview'.

April 15, 2026 · 5:21 PM

model releaseGoogle DeepMind

Google DeepMind releases Gemini 3.1 Flash TTS with audio tags for precise speech control across 70+ languages

Google DeepMind launched Gemini 3.1 Flash TTS, a text-to-speech model that achieved an Elo score of 1,211 on the Artificial Analysis TTS leaderboard. The model introduces audio tags that allow developers to control vocal style, pace, and delivery through natural language commands embedded in text input, with support for 70+ languages.

April 15, 2026 · 4:21 PM

March 30, 2026

product update

Gemini Live voice quality deteriorates after 3.1 Flash update, voices sound nothing like preview

Google's Gemini Live is experiencing persistent voice quality issues following the recent Gemini 3.1 Flash Live update. Users report that voice options like "Capella" (British female accent) have deteriorated significantly, with speech patterns changing dramatically during conversations and audio artifacts like crackles and pops becoming prominent.

March 30, 2026 · 4:20 PM

March 26, 2026

model release

Mistral releases Voxtral, open-weight TTS model that clones voices from 3 seconds of audio

Mistral has released Voxtral TTS, a 4-billion-parameter text-to-speech model that can clone voices from just three seconds of reference audio across nine languages. The model delivers 70ms latency for typical 10-second samples and outperformed ElevenLabs Flash v2.5 in naturalness tests. Voxtral is available via API at $0.016 per 1,000 characters and as open-weights on Hugging Face.

March 26, 2026 · 7:35 PM

product updateAmazon Web Services

Amazon Polly adds bidirectional streaming API for real-time speech synthesis in conversational AI

Amazon has released a new Bidirectional Streaming API for Amazon Polly that enables simultaneous text input and audio output over a single HTTP/2 connection. The API reduces end-to-end latency by 39% compared to traditional request-response TTS by allowing text to be sent word-by-word as LLMs generate tokens, rather than waiting for complete sentences. The feature is available in Java, JavaScript, .NET, C++, Go, Kotlin, PHP, Ruby, Rust, and Swift SDKs.

March 26, 2026 · 5:20 PM

model releaseMistral AI

Mistral releases Voxtral-4B-TTS-2603, open-weights text-to-speech model for production voice agents

Mistral AI released Voxtral-4B-TTS-2603, an open-weights text-to-speech model designed for production voice agents. The 4B-parameter model supports 9 languages, 20 preset voices, achieves 70ms latency at concurrency 1 on a single NVIDIA H200, and requires only 16GB GPU memory.

March 26, 2026 · 4:50 PM

model release

Mistral releases Voxtral TTS, open-source speech model for enterprise voice agents

Mistral AI released Voxtral TTS, an open-source text-to-speech model designed for enterprise voice agents and edge devices. The model supports nine languages, adapts custom voices from samples under five seconds, and achieves 90ms time-to-first-audio latency with a 6x real-time factor.

March 26, 2026 · 11:35 AM

March 23, 2026

model releaseXiaomi

Xiaomi launches MiMo-V2-Pro with 1T parameters, matches Claude Opus on coding at 80% lower cost

Xiaomi shipped three AI models simultaneously designed to form a complete agent platform. MiMo-V2-Pro, a 1-trillion-parameter Mixture-of-Experts model with 42 billion active parameters per request, scores 78% on SWE-bench Verified and 81 points on ClawEval—nearly matching Claude Opus 4.6 while costing $1 per million input tokens versus $5 for Opus.

March 23, 2026 · 3:20 PM

March 11, 2026

model release

Hume AI releases TADA-1B, a 1 billion parameter text-to-speech model

Hume AI has released TADA-1B, a 1 billion parameter text-to-speech model available on Hugging Face under an MIT license. The model, which combines speech and language capabilities, has already accumulated over 3,100 downloads since its January 12 release.

March 11, 2026 · 12:20 PM

← Back to all news