AWS launches Nova Sonic voice agent framework with AgentCore Runtime and three integration patterns
AWS released Amazon Nova Sonic, a speech-to-speech foundation model for voice agents, alongside AgentCore Runtime, a serverless hosting environment with WebSocket streaming and microVM isolation. The framework supports three integration patterns: direct tool calls via AgentCore Gateway using Model Context Protocol (MCP), sub-agent delegation with Agent-to-Agent (A2A) protocol, and session segmentation for multi-step workflows.
AWS launches Nova Sonic voice agent framework with AgentCore Runtime and three integration patterns
AWS released Amazon Nova Sonic, a speech-to-speech foundation model for building voice agents, alongside Amazon Bedrock AgentCore Runtime, a new serverless hosting environment designed specifically for AI agent deployment.
Core components
Amazon Nova Sonic enables real-time voice interactions with natural conversational flow and tone understanding. The model handles speech-to-speech conversations without intermediate text transcription steps.
Amazon Bedrock AgentCore Runtime provides:
- Bidirectional WebSocket streaming with SigV4 authentication
- MicroVM-level session isolation to prevent latency spikes from concurrent sessions
- AgentCore Gateway for shared tool hosting using Model Context Protocol (MCP)
- Persistent memory across sessions
- Voice-specific telemetry including time-to-first-audio metrics
The system integrates with Strands Agents, an open-source framework. Its BidiAgent class manages bidirectional stream lifecycle, routes tool calls, and handles session management.
Three integration patterns
AWS documented three architectural approaches for voice agent design:
Pattern 1: AgentCore Gateway tool calls
Nova Sonic calls tools directly via AgentCore Gateway, which hosts MCP servers as managed endpoints. The voice model selects which tool to invoke, passes parameters, receives results, and responds. Example: A user asks "What's my account balance?" and Nova Sonic directly calls get_account_balance from available MCP tools.
Trade-off: All decision logic runs in the voice model's system prompt. Simple for basic tools but becomes brittle for multi-step workflows.
Pattern 2: Sub-agent delegation
Business logic runs in autonomous agents, each with its own model, system prompt, and tools. The voice orchestrator delegates complete tasks rather than individual tool calls. Two implementation approaches:
- Local agent-as-tool: Sub-agents run in-process as
@toolfunctions with no network hop - Remote agent via A2A protocol: Sub-agents deployed independently on AgentCore Runtime, invoked over network using Agent-to-Agent (A2A) protocol
A2A enables cross-framework interoperability between agents built with Strands, OpenAI, LangGraph, and Google ADK.
Pattern 3: Session segmentation
Isolates prompts, memory, and permissions across workflow stages. Not fully detailed in the release.
Implementation details
According to AWS, teams can expose existing business logic through AgentCore Gateway by configuring MCP Gateway ARNs:
model = BidiNovaSonicModel(
model_id="amazon.nova-2-sonic-v1:0",
mcp_gateway_arn=[
"arn:aws:bedrock-agentcore:us-east-1:123456789012:gateway/auth-tools",
"arn:aws:bedrock-agentcore:us-east-1:123456789012:gateway/banking-tools",
],
)
For sub-agent patterns, authentication and banking agents can be wrapped as tools using Strands' agent-as-tool pattern, with each sub-agent using separate models like amazon.nova-lite-v1:0.
What this means
This release positions AWS directly against voice agent frameworks from OpenAI (Realtime API) and Anthropic (Claude with tool use). The microVM isolation addresses a real production issue—latency spikes from concurrent sessions—that serverless function approaches struggle with. The MCP and A2A protocol support indicates AWS is betting on open standards rather than proprietary integration layers, which could accelerate enterprise adoption by reducing vendor lock-in concerns. The emphasis on "composable" agents through sub-agent patterns reflects industry movement away from monolithic LLM applications toward specialized, coordinated systems.
Related Articles
AWS releases four multimodal evaluators for image-to-text AI tasks in Strands Evals SDK
AWS has added four multimodal evaluators to its Strands Evals SDK that judge image-to-text AI outputs by directly analyzing source images. The evaluators—Overall Quality, Correctness, Faithfulness, and Instruction Following—use multimodal large language models to detect visual hallucinations, factual errors, and instruction violations that text-only judges miss.
AWS SageMaker AI adds bidirectional streaming for real-time speech transcription with vLLM
Amazon SageMaker AI has launched bidirectional streaming support for real-time inference, enabling WebSocket-based voice applications through vLLM integration. The feature uses HTTP/2 on port 8443 to bridge client connections with vLLM's Realtime API, allowing audio to stream in while transcription streams back simultaneously over a single persistent connection.
Google launches Universal Cart, an AI agent that shops across multiple retailers in one checkout
Google announced Universal Cart at its I/O developer conference, an AI-powered shopping system that consolidates purchases from multiple retailers including Target, Shopify, Wayfair, and Etsy into a single checkout. The feature uses Gemini's agentic AI to verify product compatibility, suggest better deals, and automate routine purchases.
Google Announces Gemini Spark Agent and Antigravity Platform at I/O, Launch Date Not Disclosed
Google announced Gemini Spark at I/O 2026, positioning it as a competitor to OpenAI's Claude-based agents. The service will integrate with Gmail, Calendar, Drive, and other Google apps, running on Gemini 3.5 Flash and a new platform called Antigravity. No general availability date has been disclosed.
Comments
Loading...