speech-synthesis
4 articles tagged with speech-synthesis
Supertone releases Supertonic 3: 99M-parameter on-device TTS model supporting 31 languages
Supertone has released Supertonic 3, a 99M-parameter text-to-speech model that runs entirely on-device using ONNX Runtime. The model expands language support from 5 to 31 languages compared to Supertonic 2, requires no GPU, and claims competitive accuracy against models 7-20x larger.
Amazon Nova 2 Sonic enables real-time AI podcast generation with 1M token context
Amazon has published a technical guide for building real-time conversational podcasts using Amazon Nova 2 Sonic, its speech understanding and generation model. The solution demonstrates streaming audio generation, multi-turn dialogue between AI hosts, and stage-aware content filtering through a web interface.
Microsoft releases three in-house AI models for speech and images, signaling independence from OpenAI
Microsoft released public preview versions of three proprietary AI models: MAI-Transcribe-1 for speech recognition across 25 languages at 50% lower GPU cost than alternatives, MAI-Voice-1 for speech synthesis generating 60 seconds of audio in under a second, and MAI-Image-2 for text-to-image generation. The models are available exclusively through Microsoft Azure AI Foundry and already power Copilot, Bing, and PowerPoint.
Hume AI releases TADA-1B, a 1 billion parameter text-to-speech model
Hume AI has released TADA-1B, a 1 billion parameter text-to-speech model available on Hugging Face under an MIT license. The model, which combines speech and language capabilities, has already accumulated over 3,100 downloads since its January 12 release.