voice-agents

4 articles tagged with voice-agents

June 9, 2026

benchmark

ServiceNow Releases First Code-Switching ASR Benchmark: ElevenLabs Scribe V2 Leads with Lowest WER Across Four Language

ServiceNow released AU-Harness, the first comprehensive benchmark for code-switched speech recognition in enterprise voice agents, testing seven ASR systems including ElevenLabs, Gemini, and AssemblyAI. The benchmark covers 918 utterances across Spanish-English, French-English, Canadian French-English, and German-English, measuring Word Error Rate (WER), Semantic WER (SWER), and Answer Error Rate (AER). ElevenLabs Scribe V2 achieved the lowest WER across all language pairs, followed closely by AssemblyAI Universal-3 Pro.

June 9, 2026 · 7:50 PM

June 8, 2026

product updateAmazon Web Services

AWS releases open-source test harness for evaluating Amazon Nova Sonic voice agents at scale

Amazon has released an open-source testing framework for Nova Sonic voice agents that automates multi-turn conversation evaluation without requiring human testers. The harness uses LLM-as-judge techniques to assess voice agents across six metrics including goal achievement, response accuracy, and tool usage, addressing a critical QA bottleneck in voice AI development.

June 8, 2026 · 4:05 PM

May 14, 2026

product updateAmazon Web Services

AWS launches real-time voice agent framework combining Stream Vision Agents with Nova 2 Sonic

Amazon has released Stream's Vision Agents, an open-source Python framework for building real-time voice AI agents that integrates with Amazon Nova 2 Sonic through Bedrock. The system delivers end-to-end latency under 500 milliseconds using Stream's global edge network with sub-30ms audio latency and typically sub-500ms join times.

May 14, 2026 · 5:36 PM

March 26, 2026

model releaseMistral AI

Mistral releases Voxtral-4B-TTS-2603, open-weights text-to-speech model for production voice agents

Mistral AI released Voxtral-4B-TTS-2603, an open-weights text-to-speech model designed for production voice agents. The 4B-parameter model supports 9 languages, 20 preset voices, achieves 70ms latency at concurrency 1 on a single NVIDIA H200, and requires only 16GB GPU memory.

March 26, 2026 · 4:50 PM

← Back to all news