Microsoft's MAI-Transcribe-1 achieves lowest word error rate on FLEURS, costs $0.36/audio hour

TL;DR

Microsoft has released MAI-Transcribe-1, a speech-to-text model that achieves the lowest word error rate on the FLEURS benchmark across 25 languages, outperforming Whisper-large-V3, GPT-Transcribe, and Gemini 3.1 Flash-Lite. The model runs 2.5 times faster than Microsoft's previous Azure Fast offering and costs $0.36 per audio hour.

April 2, 2026 · 4:35 PM1 min read

MAI-Transcribe-1 — Quick Specs

Compare MAI-Transcribe-1 with other models →

Microsoft's MAI-Transcribe-1 Achieves Lowest Word Error Rate on FLEURS Benchmark

Microsoft has introduced MAI-Transcribe-1, a multilingual speech-to-text model supporting 25 languages that outperforms competing transcription systems on the FLEURS benchmark.

Performance and Capabilities

MAI-Transcribe-1 achieves the lowest word error rate among tested models, beating Scribe v2, Whisper-large-V3, GPT-Transcribe, and Gemini 3.1 Flash-Lite across the FLEURS evaluation suite. Microsoft says the model is optimized for challenging recording conditions, including background noise, poor audio quality, and overlapping speech.

The model delivers 2.5x faster inference than Microsoft's previous Azure Fast transcription offering. When combined with MAI-Voice-1 (Microsoft's text-to-speech model) and a language model, MAI-Transcribe-1 can power voice agents, according to Microsoft.

Pricing and Availability

MAI-Transcribe-1 is priced at $0.36 per audio hour. The model is rolling out across Copilot Voice and Microsoft Teams. Developers can access it through a public preview on Microsoft Foundry and the Microsoft AI Playground.

Market Context

The release comes as open-source alternatives gain traction. Cohere and Mistral recently released open-source speech-to-text models that perform at comparable quality levels, offering cost-free deployment options for organizations willing to handle self-hosting infrastructure.

What This Means

MAI-Transcribe-1 positions Microsoft competitively in speech recognition, addressing both accuracy and speed requirements for enterprise voice applications. The $0.36/hour pricing sits in the mid-market range for commercial transcription APIs. However, the emergence of capable open-source alternatives means Microsoft must justify the API model through deployment convenience and integration with Copilot and Teams ecosystems rather than technology superiority alone. The 2.5x speed improvement over Azure Fast suggests meaningful optimization work, relevant for real-time voice agent applications.

Source: the-decoder.com ↗

speech-to-text multilingual-ai microsoft model-release transcription fleurs-benchmark

product updateJuly 1, 2026

GitHub Copilot CLI adds Microsoft C++ Language Server plugin with automated setup

GitHub has added the Microsoft C++ Language Server as a plugin to the Copilot CLI marketplace. The plugin includes a built-in setup skill designed to automate C++ project configuration.

model releaseJune 30, 2026

Google releases Gemini 3.1 Flash Lite Image, its fastest and cheapest image generation model

Google has released Gemini 3.1 Flash Lite Image, also called Nano Banana 2 Lite, which the company describes as its fastest and cheapest image generation model. The model is available through Google's AI Studio and Gemini API with the identifier gemini-3.1-flash-lite-image.

model releaseJune 30, 2026

Claude Sonnet 5 ships with 1M token context and new tokenizer that increases costs 30-40% for English text

Anthropic released Claude Sonnet 5 with a 1 million token context window and 128,000 token maximum output. The model removes traditional sampling parameters and introduces a new tokenizer that generates approximately 30% more tokens than Sonnet 4.6 for the same English text—effectively a significant price increase despite unchanged nominal rates of $3/million input and $15/million output tokens.

model releaseJune 30, 2026

Claude Sonnet 5 launches on AWS Bedrock with Opus-level intelligence at Sonnet pricing

Anthropic has released Claude Sonnet 5 on Amazon Bedrock and Claude Platform on AWS. The model delivers what Anthropic describes as near-Opus intelligence while maintaining Sonnet-tier pricing, with promotional rates available through August 31, 2026.

Microsoft's MAI-Transcribe-1 achieves lowest word error rate on FLEURS, costs $0.36/audio hour

MAI-Transcribe-1 — Quick Specs

Microsoft's MAI-Transcribe-1 Achieves Lowest Word Error Rate on FLEURS Benchmark

Performance and Capabilities

Pricing and Availability

Market Context

What This Means

Related Articles

GitHub Copilot CLI adds Microsoft C++ Language Server plugin with automated setup

Google releases Gemini 3.1 Flash Lite Image, its fastest and cheapest image generation model

Claude Sonnet 5 ships with 1M token context and new tokenizer that increases costs 30-40% for English text

Claude Sonnet 5 launches on AWS Bedrock with Opus-level intelligence at Sonnet pricing

Comments