speech-to-text

5 articles tagged with speech-to-text

May 20, 2026

AWS SageMaker AI adds bidirectional streaming for real-time speech transcription with vLLM

Amazon SageMaker AI has launched bidirectional streaming support for real-time inference, enabling WebSocket-based voice applications through vLLM integration. The feature uses HTTP/2 on port 8443 to bridge client connections with vLLM's Realtime API, allowing audio to stream in while transcription streams back simultaneously over a single persistent connection.

May 20, 2026 · 5:20 PM

May 7, 2026

product updateOpenAI+1

OpenAI launches GPT-Realtime-2 with GPT-5-class reasoning, adds real-time translation across 70 languages

OpenAI has added three voice intelligence features to its Realtime API: GPT-Realtime-2 with GPT-5-class reasoning for complex conversational requests, GPT-Realtime-Translate supporting 70 input languages and 13 output languages, and GPT-Realtime-Whisper for live speech-to-text transcription. Translation and transcription are billed by the minute, while GPT-Realtime-2 uses token-based pricing.

May 7, 2026 · 10:35 PM

April 2, 2026

model releaseMicrosoft

Microsoft's MAI-Transcribe-1 achieves lowest word error rate on FLEURS, costs $0.36/audio hour

Microsoft has released MAI-Transcribe-1, a speech-to-text model that achieves the lowest word error rate on the FLEURS benchmark across 25 languages, outperforming Whisper-large-V3, GPT-Transcribe, and Gemini 3.1 Flash-Lite. The model runs 2.5 times faster than Microsoft's previous Azure Fast offering and costs $0.36 per audio hour.

April 2, 2026 · 4:35 PM

March 14, 2026

model release

Hume AI open-sources TADA: speech model 5x faster than rivals with zero hallucination

Hume AI has open-sourced TADA, a speech generation model that maps exactly one audio signal to each text token, achieving 5x faster processing than comparable systems. The model produced zero transcription hallucinations across 1,000+ test samples and runs on smartphones, available in 1B and 3B parameter versions under MIT license.

March 14, 2026 · 6:20 PM

March 1, 2026

benchmark

ElevenLabs and Google lead Artificial Analysis speech-to-text benchmark

Artificial Analysis has released an updated speech-to-text benchmark showing ElevenLabs and Google as top performers. The benchmark provides comparative analysis of current speech recognition systems across multiple models.

March 1, 2026 · 3:05 PM

← Back to all news