Amazon Nova 2 Sonic Unifies Speech Recognition, Reasoning, and TTS in Single Streaming Model

TL;DR

Amazon Web Services released technical guidance for migrating text agents to voice assistants using Amazon Nova 2 Sonic, a native speech-to-speech model that combines automatic speech recognition, reasoning, tool calling, and text-to-speech in a single bidirectional streaming interface. The model supports asynchronous tool calling and built-in voice activity detection for handling interruptions.

April 28, 2026 · 6:06 PM2 min read

Amazon Web Services released technical guidance for migrating text agents to voice assistants using Amazon Nova 2 Sonic, a native speech-to-speech model that unifies automatic speech recognition (ASR), reasoning, tool use, and text-to-speech (TTS) in one bidirectional streaming interface.

Unlike traditional voice agent architectures that chain separate ASR → LLM → TTS components, Nova 2 Sonic handles the entire voice pipeline in a single model. The architecture accepts both text and audio inputs through the same interface, allowing teams to reuse existing prompts and tools from text agents while eliminating the need for a separate text reasoning model in the voice stack.

Architecture and capabilities

According to AWS, Nova 2 Sonic includes built-in voice activity detection (VAD) and turn detection, managing conversation context internally without requiring full history to be sent on each turn. The model supports asynchronous tool calling, enabling conversations to continue naturally while tools run in the background. It can run multiple tools in parallel and adapts if users change requests mid-process.

AWS identifies latency as a critical difference between text and voice agents. Text agents have mid-latency tolerance of a few seconds with loading indicators. Voice agents require response times in the hundreds of milliseconds, with delays of even a few seconds during tool calls feeling unresponsive to users. Each tool call adds noticeable silence in voice interactions.

Implementation requirements

The migration requires changes across three architectural components. Client applications need persistent bidirectional connections (WebSocket or WebRTC) and must handle audio encoding/decoding, client events, barge-in logic, and noise control—significantly more complex than stateless REST interfaces used by text clients.

Orchestrators in voice agents add audio streaming, VAD, ASR, reasoning, and TTS to the system prompt management and tool routing handled in text agents. Nova 2 Sonic's unified interface allows teams to migrate reasoning prompts and tool triggers directly from existing text agents.

Response design also shifts fundamentally. Text agents deliver paragraphs with rich formatting, lists, and links that users can read at their own pace. Voice agents require conversational, concise responses structured for listening. For example, a banking text agent might display full account summaries with formatted lists, while a voice agent would break information into digestible chunks and ask for confirmation before continuing.

Availability

AWS published a sample repository with a skill that works with AI IDEs like Kiro and Claude Code to automatically convert text agents into voice agents. Pricing for Nova 2 Sonic was not disclosed in the announcement.

What this means

Nova 2 Sonic represents AWS's push into native speech-to-speech models that compete with OpenAI's Realtime API and similar offerings. By unifying the voice pipeline in a single model rather than chaining components, AWS claims to reduce latency and architectural complexity. The asynchronous tool calling and built-in conversation management address key pain points in voice agent development, though real-world performance metrics and benchmark scores have not been published. The lack of disclosed pricing makes cost comparison with existing voice agent architectures difficult.

Source: aws.amazon.com ↗

Amazon Nova 2 Sonic speech-to-speech voice assistants AWS tool calling multimodal AI streaming

product updateApril 24, 2026

OpenAI releases ChatGPT Images 2.0 with accurate text rendering and brand-style matching

OpenAI launched ChatGPT Images 2.0, upgrading from decorative images to full-page graphics with detailed text rendering. The update is available to all ChatGPT tiers, with advanced features requiring paid subscriptions that access the Thinking model. Hands-on testing shows significant improvements in text accuracy and brand-style replication, though factual errors still occur.

product updateApril 28, 2026

IBM releases Bob AI coding assistant after testing on 80,000 employees, claims 45% productivity gains

IBM has launched Bob, its AI coding assistant, following internal testing with 80,000 employees. The company claims teams saw average productivity gains of 45% across complex workflows. Pricing ranges from $20 to $200 per month using a "Bobcoin" credit system.

product updateApril 28, 2026

Amazon launches Quick desktop app with persistent context tracking across Google Workspace, Microsoft 365, Zoom, and Sal

Amazon has released a desktop version of its Quick AI assistant that integrates with Google Workspace, Microsoft 365, Zoom, and Salesforce, storing persistent context about user activities to automate tasks. The company also split Amazon Connect into four vertical-specific products: Connect Decisions, Connect Talent, Connect Health, and Connect Customer AI.

product updateApril 28, 2026

Google cuts Gemini voice assistant response time by 1.5 seconds for smart home controls

Google's Gemini for Home voice assistant now executes smart home commands up to 1.5 seconds faster for lights and plugs, the company announced. The update also brings near-instant processing for alarms, timers, and reminders, currently available for English, French, and Spanish users.

Amazon Nova 2 Sonic Unifies Speech Recognition, Reasoning, and TTS in Single Streaming Model

Architecture and capabilities

Implementation requirements

Availability

What this means

Related Articles

OpenAI releases ChatGPT Images 2.0 with accurate text rendering and brand-style matching

IBM releases Bob AI coding assistant after testing on 80,000 employees, claims 45% productivity gains

Amazon launches Quick desktop app with persistent context tracking across Google Workspace, Microsoft 365, Zoom, and Sal

Google cuts Gemini voice assistant response time by 1.5 seconds for smart home controls

Comments