MAI-Transcribe-1

Microsoft🇺🇸 United States
active

Version History

1.0major

Microsoft released MAI-Transcribe-1, a speech-to-text model achieving lowest FLEURS benchmark word error rate at 2.5x faster inference than Azure Fast. Priced at $0.36 per audio hour, supporting 25 languages and challenging recording conditions.

Coverage

model releaseMicrosoft

Microsoft's MAI-Transcribe-1 achieves lowest word error rate on FLEURS, costs $0.36/audio hour

Microsoft has released MAI-Transcribe-1, a speech-to-text model that achieves the lowest word error rate on the FLEURS benchmark across 25 languages, outperforming Whisper-large-V3, GPT-Transcribe, and Gemini 3.1 Flash-Lite. The model runs 2.5 times faster than Microsoft's previous Azure Fast offering and costs $0.36 per audio hour.

1 min read