voice-ai
6 articles tagged with voice-ai
AWS Releases AgentCore Platform for Building Voice AI Agents with Nova 2 Sonic Integration
Amazon has released Bedrock AgentCore, a platform for building and deploying AI agents with microVM isolation for secure session handling. The platform integrates with Amazon Nova 2 Sonic, a speech-to-speech foundation model, and supports the Model Context Protocol (MCP) for connecting agents to backend services.
AWS launches agentic AI movie assistant using Nova Sonic 2.0 and Bedrock AgentCore
Amazon Web Services unveiled an agentic AI system for streaming platforms combining Nova Sonic 2.0 (real-time speech model), Bedrock AgentCore, and the Model Context Protocol. The system delivers two core capabilities: context-aware movie recommendations based on mood and viewing history, and real-time scene analysis including actor identification and plot summaries.
Gemini 3.1 Flash Live scores 95.9% on Big Bench Audio, Google's fastest voice model
Google has released Gemini 3.1 Flash Live, its new voice and audio AI model, scoring 95.9% on the Big Bench Audio Benchmark at high thinking levels—second only to Step-Audio R1.1 Realtime at 97.0%. Response times range from 0.96 seconds at minimal thinking to 2.98 seconds at high thinking, with pricing held at $0.35 per hour of audio input and $1.40 per hour of audio output.
Google releases Gemini 3.1 Flash Live, claims improved audio recognition and lower latency for voice conversations
Google announced Gemini 3.1 Flash Live as its updated audio and voice model for Gemini Live and Search Live. The model claims improved acoustic recognition, better background noise filtering, support for over 90 languages, and lower latency compared to 2.5 Flash Native Audio.
Google releases Gemini 3.1 Flash Live, its highest-quality audio model for real-time voice AI
Google has released Gemini 3.1 Flash Live, its highest-quality audio and voice model designed for real-time dialogue. The model scores 90.8% on ComplexFuncBench Audio and 36.1% on Scale AI's Audio MultiChallenge with reasoning enabled, with improved tonal understanding and lower latency compared to previous versions.
Mistral releases Voxtral TTS, open-source speech model for enterprise voice agents
Mistral AI released Voxtral TTS, an open-source text-to-speech model designed for enterprise voice agents and edge devices. The model supports nine languages, adapts custom voices from samples under five seconds, and achieves 90ms time-to-first-audio latency with a 6x real-time factor.