speech-recognition

12 articles tagged with speech-recognition

June 9, 2026

benchmark

ServiceNow Releases First Code-Switching ASR Benchmark: ElevenLabs Scribe V2 Leads with Lowest WER Across Four Language

ServiceNow released AU-Harness, the first comprehensive benchmark for code-switched speech recognition in enterprise voice agents, testing seven ASR systems including ElevenLabs, Gemini, and AssemblyAI. The benchmark covers 918 utterances across Spanish-English, French-English, Canadian French-English, and German-English, measuring Word Error Rate (WER), Semantic WER (SWER), and Answer Error Rate (AER). ElevenLabs Scribe V2 achieved the lowest WER across all language pairs, followed closely by AssemblyAI Universal-3 Pro.

June 9, 2026 · 7:50 PM

June 4, 2026

model releaseNVIDIA

NVIDIA Releases Nemotron 3.5 ASR: 600M-Parameter Streaming Speech Model for 40 Languages

NVIDIA released Nemotron 3.5 ASR, a 600M-parameter speech-to-text model supporting 40 language-locales from a single checkpoint. The model achieves 0.07 seconds to final transcript after speech ends and ranks 2nd in latency among streaming ASR models according to Artificial Analysis benchmarks.

June 4, 2026 · 1:06 PM

June 2, 2026

model releaseMicrosoft

Microsoft releases MAI-Thinking-1, its first reasoning model with 35B parameters

Microsoft released seven AI models at Build 2026, headlined by MAI-Thinking-1, its first reasoning model with 35 billion parameters. The company claims the model matches Anthropic's Claude Opus 4.6 on SWE Bench Pro coding benchmarks and beats Sonnet 4.61 in blind tests.

June 2, 2026 · 6:51 PM

May 28, 2026

model releaseMistral AI

Mistral AI Releases Voxtral: Apache 2.0 Speech Models with 32K Token Context at $0.001/Minute

Mistral AI released Voxtral, a family of open-source speech understanding models available in 24B and 3B parameter variants under Apache 2.0 license. The models support up to 32K token context (30 minutes of audio for transcription, 40 minutes for understanding) and are priced at $0.001 per minute via API—less than half the cost of comparable proprietary systems according to Mistral.

May 28, 2026 · 9:51 AM

April 28, 2026

changelogOpenAI

OpenAI Makes Whisper Speech Recognition Available on OpenRouter at $0.006 per Minute

OpenAI's Whisper 1 automatic speech recognition model is now accessible through OpenRouter's API routing service. The model supports transcription and translation across 50+ languages from audio files up to 25 MB, priced at $0.006 per minute of audio.

April 28, 2026 · 12:35 AM

April 14, 2026

product update

Google Home April 2026 update reduces Gemini interruptions, improves speech recognition in noisy environments

Google Home's April 2026 update addresses Gemini voice assistant reliability issues. The update improves speech detection to reduce mid-sentence interruptions, speeds up responses to simple queries, and enhances music playlist recognition even when names are misspoken or in noisy environments.

April 14, 2026 · 7:35 PM

April 2, 2026

model releaseMicrosoft

Microsoft releases three in-house AI models for speech and images, signaling independence from OpenAI

Microsoft released public preview versions of three proprietary AI models: MAI-Transcribe-1 for speech recognition across 25 languages at 50% lower GPU cost than alternatives, MAI-Voice-1 for speech synthesis generating 60 seconds of audio in under a second, and MAI-Image-2 for text-to-image generation. The models are available exclusively through Microsoft Azure AI Foundry and already power Copilot, Bing, and PowerPoint.

April 2, 2026 · 8:20 PM

March 31, 2026

model release+1

Alibaba's Qwen3.5-Omni learns to write code from speech and video without explicit training

Alibaba has released Qwen3.5-Omni, an omnimodal model handling text, images, audio, and video with a 256,000-token context window. The model reportedly outperforms Google's Gemini 3.1 Pro on audio tasks with support for 74 languages in speech recognition, a 6x increase from its predecessor. An unexpected emergent capability: writing working code from spoken instructions and video input, which the team did not explicitly train.

March 31, 2026 · 12:35 PM

March 27, 2026

model releaseCohere

Cohere releases 2B open-source speech model with 5.42% word error rate

Cohere has released Transcribe, a 2 billion parameter open-source automatic speech recognition model that the company claims tops the Hugging Face Open ASR Leaderboard with a 5.42% word error rate. The model supports 14 languages and is available under Apache 2.0 license, outperforming OpenAI's Whisper Large v3 and competing models on both accuracy and throughput metrics.

March 27, 2026 · 6:50 PM

March 9, 2026

model release

IBM releases Granite 4.0 1B Speech: multilingual model for edge devices

IBM has released Granite 4.0 1B Speech, a 1 billion parameter multilingual speech model designed for edge deployment. The model supports multiple languages and is optimized for devices with limited computational resources.

March 9, 2026 · 6:50 PM

March 1, 2026

benchmark

ElevenLabs and Google lead Artificial Analysis speech-to-text benchmark

Artificial Analysis has released an updated speech-to-text benchmark showing ElevenLabs and Google as top performers. The benchmark provides comparative analysis of current speech recognition systems across multiple models.

March 1, 2026 · 3:05 PM

February 24, 2026

researchApple

Apple Research Identifies 'Text-Speech Understanding Gap' Limiting LLM Speech Performance

Apple researchers have identified a fundamental limitation in speech-adapted large language models: they consistently underperform their text-based counterparts on language understanding tasks. The team terms this the 'text-speech understanding gap' and documents that speech-adapted LLMs lag behind both their original text versions and cascaded speech-to-text pipelines.

February 24, 2026 · 11:35 PM

← Back to all news