speech-to-text
4 articles tagged with speech-to-text
OpenAI launches GPT-Realtime-2 with GPT-5-class reasoning, adds real-time translation across 70 languages
OpenAI has added three voice intelligence features to its Realtime API: GPT-Realtime-2 with GPT-5-class reasoning for complex conversational requests, GPT-Realtime-Translate supporting 70 input languages and 13 output languages, and GPT-Realtime-Whisper for live speech-to-text transcription. Translation and transcription are billed by the minute, while GPT-Realtime-2 uses token-based pricing.
Microsoft's MAI-Transcribe-1 achieves lowest word error rate on FLEURS, costs $0.36/audio hour
Microsoft has released MAI-Transcribe-1, a speech-to-text model that achieves the lowest word error rate on the FLEURS benchmark across 25 languages, outperforming Whisper-large-V3, GPT-Transcribe, and Gemini 3.1 Flash-Lite. The model runs 2.5 times faster than Microsoft's previous Azure Fast offering and costs $0.36 per audio hour.
Hume AI open-sources TADA: speech model 5x faster than rivals with zero hallucination
Hume AI has open-sourced TADA, a speech generation model that maps exactly one audio signal to each text token, achieving 5x faster processing than comparable systems. The model produced zero transcription hallucinations across 1,000+ test samples and runs on smartphones, available in 1B and 3B parameter versions under MIT license.
ElevenLabs and Google lead Artificial Analysis speech-to-text benchmark
Artificial Analysis has released an updated speech-to-text benchmark showing ElevenLabs and Google as top performers. The benchmark provides comparative analysis of current speech recognition systems across multiple models.