voice-agents
4 articles tagged with voice-agents
ServiceNow Releases First Code-Switching ASR Benchmark: ElevenLabs Scribe V2 Leads with Lowest WER Across Four Language
ServiceNow released AU-Harness, the first comprehensive benchmark for code-switched speech recognition in enterprise voice agents, testing seven ASR systems including ElevenLabs, Gemini, and AssemblyAI. The benchmark covers 918 utterances across Spanish-English, French-English, Canadian French-English, and German-English, measuring Word Error Rate (WER), Semantic WER (SWER), and Answer Error Rate (AER). ElevenLabs Scribe V2 achieved the lowest WER across all language pairs, followed closely by AssemblyAI Universal-3 Pro.
AWS releases open-source test harness for evaluating Amazon Nova Sonic voice agents at scale
Amazon has released an open-source testing framework for Nova Sonic voice agents that automates multi-turn conversation evaluation without requiring human testers. The harness uses LLM-as-judge techniques to assess voice agents across six metrics including goal achievement, response accuracy, and tool usage, addressing a critical QA bottleneck in voice AI development.
AWS launches real-time voice agent framework combining Stream Vision Agents with Nova 2 Sonic
Amazon has released Stream's Vision Agents, an open-source Python framework for building real-time voice AI agents that integrates with Amazon Nova 2 Sonic through Bedrock. The system delivers end-to-end latency under 500 milliseconds using Stream's global edge network with sub-30ms audio latency and typically sub-500ms join times.
Mistral releases Voxtral-4B-TTS-2603, open-weights text-to-speech model for production voice agents
Mistral AI released Voxtral-4B-TTS-2603, an open-weights text-to-speech model designed for production voice agents. The 4B-parameter model supports 9 languages, 20 preset voices, achieves 70ms latency at concurrency 1 on a single NVIDIA H200, and requires only 16GB GPU memory.