product updateAmazon Web Services

AWS launches Nova Sonic voice agent framework with AgentCore Runtime and three integration patterns

TL;DR

AWS released Amazon Nova Sonic, a speech-to-speech foundation model for voice agents, alongside AgentCore Runtime, a serverless hosting environment with WebSocket streaming and microVM isolation. The framework supports three integration patterns: direct tool calls via AgentCore Gateway using Model Context Protocol (MCP), sub-agent delegation with Agent-to-Agent (A2A) protocol, and session segmentation for multi-step workflows.

2 min read
0

AWS launches Nova Sonic voice agent framework with AgentCore Runtime and three integration patterns

AWS released Amazon Nova Sonic, a speech-to-speech foundation model for building voice agents, alongside Amazon Bedrock AgentCore Runtime, a new serverless hosting environment designed specifically for AI agent deployment.

Core components

Amazon Nova Sonic enables real-time voice interactions with natural conversational flow and tone understanding. The model handles speech-to-speech conversations without intermediate text transcription steps.

Amazon Bedrock AgentCore Runtime provides:

  • Bidirectional WebSocket streaming with SigV4 authentication
  • MicroVM-level session isolation to prevent latency spikes from concurrent sessions
  • AgentCore Gateway for shared tool hosting using Model Context Protocol (MCP)
  • Persistent memory across sessions
  • Voice-specific telemetry including time-to-first-audio metrics

The system integrates with Strands Agents, an open-source framework. Its BidiAgent class manages bidirectional stream lifecycle, routes tool calls, and handles session management.

Three integration patterns

AWS documented three architectural approaches for voice agent design:

Pattern 1: AgentCore Gateway tool calls
Nova Sonic calls tools directly via AgentCore Gateway, which hosts MCP servers as managed endpoints. The voice model selects which tool to invoke, passes parameters, receives results, and responds. Example: A user asks "What's my account balance?" and Nova Sonic directly calls get_account_balance from available MCP tools.

Trade-off: All decision logic runs in the voice model's system prompt. Simple for basic tools but becomes brittle for multi-step workflows.

Pattern 2: Sub-agent delegation
Business logic runs in autonomous agents, each with its own model, system prompt, and tools. The voice orchestrator delegates complete tasks rather than individual tool calls. Two implementation approaches:

  • Local agent-as-tool: Sub-agents run in-process as @tool functions with no network hop
  • Remote agent via A2A protocol: Sub-agents deployed independently on AgentCore Runtime, invoked over network using Agent-to-Agent (A2A) protocol

A2A enables cross-framework interoperability between agents built with Strands, OpenAI, LangGraph, and Google ADK.

Pattern 3: Session segmentation
Isolates prompts, memory, and permissions across workflow stages. Not fully detailed in the release.

Implementation details

According to AWS, teams can expose existing business logic through AgentCore Gateway by configuring MCP Gateway ARNs:

model = BidiNovaSonicModel(
    model_id="amazon.nova-2-sonic-v1:0",
    mcp_gateway_arn=[
        "arn:aws:bedrock-agentcore:us-east-1:123456789012:gateway/auth-tools",
        "arn:aws:bedrock-agentcore:us-east-1:123456789012:gateway/banking-tools",
    ],
)

For sub-agent patterns, authentication and banking agents can be wrapped as tools using Strands' agent-as-tool pattern, with each sub-agent using separate models like amazon.nova-lite-v1:0.

What this means

This release positions AWS directly against voice agent frameworks from OpenAI (Realtime API) and Anthropic (Claude with tool use). The microVM isolation addresses a real production issue—latency spikes from concurrent sessions—that serverless function approaches struggle with. The MCP and A2A protocol support indicates AWS is betting on open standards rather than proprietary integration layers, which could accelerate enterprise adoption by reducing vendor lock-in concerns. The emphasis on "composable" agents through sub-agent patterns reflects industry movement away from monolithic LLM applications toward specialized, coordinated systems.

Related Articles

product update

AWS enables fine-tuning of Amazon Nova models for email extraction, achieving 94.77% accuracy with 50% cost reduction

AWS released guidance on fine-tuning Amazon Nova Micro and Nova Lite models for automated email data extraction using SageMaker AI. In collaboration with Parcel Perform, the fine-tuned Nova Micro achieved 94.77% extraction accuracy—a 16.6 percentage point improvement—while reducing inference costs by 50% and latency by 30% compared to previous models.

product update

Apple ships Safari MCP server in Technology Preview 247, enabling AI coding agents to inspect and debug websites

Apple has released an MCP server for Safari Technology Preview 247 that allows AI coding agents to directly inspect and debug websites. The server gives agents access to console logs, network requests, screenshots, and DOM interactions through the Model Context Protocol standard created by Anthropic.

product update

AWS brings NVIDIA Nemotron and OpenAI GPT OSS models to GovCloud for secure government AI workloads

Amazon Bedrock now supports NVIDIA Nemotron and OpenAI GPT OSS models in AWS GovCloud (US) Regions. The launch includes OpenAI's GPT OSS models (120B and 20B parameters, 128K context) and NVIDIA Nemotron 3 family (9B to 120B parameters, 1M context), providing government agencies FedRAMP High and DoD SRG Level 5-compliant AI inference on U.S. soil.

product update

AWS adds metadata filtering to AgentCore Memory, improving agent retrieval accuracy from 40% to 64%

Amazon has added metadata filtering to its AgentCore Memory service for AI agents. In AWS evaluations across 151 questions, the feature improved overall question-answering accuracy from 40% to 64%, with context-dependent questions jumping from 16% to 69% accuracy. The update allows agents to filter memory retrieval by attributes like priority, department, or time range before semantic search runs.

Comments

Loading...