AWS releases healthcare appointment agent tutorial using Nova 2 Sonic speech-to-speech model

TL;DR

AWS published a technical guide for building voice appointment agents using Amazon Nova 2 Sonic, a speech-to-speech model that processes audio natively without separate transcription steps. The tutorial covers authentication, scheduling, and escalation tools running on Amazon Bedrock AgentCore with DynamoDB persistence.

June 24, 2026 · 6:35 PM3 min read

AWS releases healthcare appointment agent tutorial using Nova 2 Sonic speech-to-speech model

AWS published a technical guide for building voice appointment agents using Amazon Nova 2 Sonic, a speech-to-speech model available on Amazon Bedrock that processes audio natively without chaining separate transcription, reasoning, and text-to-speech services.

The tutorial addresses healthcare appointment no-shows, which range from 5–30 percent across US healthcare providers depending on specialty. The agent authenticates patients by voice, manages appointments (confirm, cancel, reschedule), collects pre-visit health information, and escalates to human staff when needed.

Technical architecture

The implementation uses Amazon Bedrock AgentCore, a serverless runtime for AI agents with IAM-authenticated WebSocket endpoints. The agent is built with the Strands Agents SDK's BidiAgent class for bidirectional voice streaming.

The system architecture includes:

React frontend capturing browser microphone audio over SigV4-signed WebSocket
Amazon Cognito for authentication with temporary AWS credentials
Amazon Bedrock AgentCore hosting the containerized Strands BidiAgent
Nova 2 Sonic processing speech input and generating audio responses
Three DynamoDB tables storing patient records, appointments, and time slots
Amazon SNS for escalation notifications

According to AWS, Nova 2 Sonic's native speech-to-speech processing retains vocal nuance like tone and hesitation that text transcription discards. The model supports 16kHz input and output sample rates with multilingual capabilities and accent handling.

Seven healthcare tools

The agent implements seven Python functions using the Strands SDK's @tool decorator:

authenticate_patient: Verifies identity using name and last four SSN digits via DynamoDB Global Secondary Index, enforcing three-attempt limit
confirm_appointment: Updates appointment status from Scheduled/Rescheduled to Confirmed with idempotent checks
cancel_appointment: Changes status to Canceled with timestamped reason
find_available_slots: Queries open time slots from provider schedule
book_appointment_slot: Finalizes rescheduling after slot selection
collect_health_info: Gathers pre-visit patient information
escalate_to_human: Triggers Amazon SNS notification for human staff intervention

Nova 2 Sonic decides when to invoke each tool based on patient responses. The setup requires minimal code:

from strands.experimental.bidi import BidiAgent
from strands.experimental.bidi.models import BidiNovaSonicModel

model = BidiNovaSonicModel(
    region="us-east-1",
    model_id="amazon.nova-2-sonic-v1:0",
    provider_config={
        "audio": {
            "input_sample_rate": 16000,
            "output_sample_rate": 16000,
            "voice": "tiffany",
        }
    },
    tools=tools,
)

agent = BidiAgent(
    model=model,
    tools=tools,
    system_prompt=system_prompt,
)

Deployment details

The tutorial includes a browser-based interface for testing. To connect the agent to actual phone lines for outbound dialing, AWS notes that integration with a telephony service like Amazon Connect Customer is required.

The serverless deployment runs entirely on AWS infrastructure with Amazon Cognito authentication, DynamoDB persistence, and SNS notifications. The Strands SDK handles bidirectional streaming complexity, managing audio input, tool invocation, and audio response generation.

What this means

This tutorial demonstrates AWS's push to move speech AI workloads away from chained transcription-LLM-TTS architectures toward native speech-to-speech models. By preserving acoustic context through the full conversation, Nova 2 Sonic can potentially detect patient anxiety or confusion that text transcription loses.

The healthcare use case targets a specific economic problem: appointment no-shows cost providers revenue and delay patient care. Voice agents that handle routine confirmation calls at scale could reduce manual phone bank operations, though AWS provides no data on actual no-show rate reduction from this approach.

The modular tool design means developers can add capabilities by writing individual Python functions without rebuilding the agent. For organizations already using AWS infrastructure, the serverless deployment on Bedrock AgentCore eliminates separate infrastructure management.

Source: aws.amazon.com ↗

Amazon Nova 2 Sonic Bedrock AgentCore Speech-to-Speech Healthcare Voice Agents DynamoDB

product updateJune 24, 2026

Loka Achieves 87% Speech Reasoning Accuracy Using Amazon Nova 2 Sonic, Outperforming GPT Realtime and Gemini

Loka built a conversational voice agent using Amazon Nova 2 Sonic that achieved 87.0% speech reasoning accuracy on Big Bench Audio, surpassing GPT Realtime at 83.0% and Gemini 2.5 Flash Native Audio at 71.0%. The system delivers Time to First Audio of 1.39 seconds at approximately $0.27 per hour of input audio.

product updateJune 19, 2026

AWS launches Web Search on Amazon Bedrock AgentCore with tens of billions of documents, no external API required

Amazon Web Services launched Web Search on Amazon Bedrock AgentCore, a fully managed web search capability that gives AI agents access to tens of billions of documents without requiring external search APIs. The service, now generally available, runs entirely within AWS infrastructure and refreshes its index within minutes of new content appearing online.

product updateJune 18, 2026

AWS Releases AgentCore Harness for Production AI Agents with Two-API Setup

Amazon Web Services made its AgentCore harness generally available, reducing production AI agent deployment to two API calls: CreateHarness and InvokeHarness. The managed service handles sandboxed execution, memory, tool integration, and observability, eliminating infrastructure setup for teams building LLM agents.

product updateJune 24, 2026

Google adds screen selection tool to Chrome's Gemini panel, integrates computer use into Gemini 3.5 Flash API

Google has added a screen selection tool to Chrome 149's Gemini panel that allows users to capture text or images from their current tab for prompts. Separately, the company integrated computer use capabilities directly into the Gemini 3.5 Flash model API, replacing the standalone Gemini 2.5 Computer Use model.

AWS releases healthcare appointment agent tutorial using Nova 2 Sonic speech-to-speech model

AWS releases healthcare appointment agent tutorial using Nova 2 Sonic speech-to-speech model

Technical architecture

Seven healthcare tools

Deployment details

What this means

Related Articles

Loka Achieves 87% Speech Reasoning Accuracy Using Amazon Nova 2 Sonic, Outperforming GPT Realtime and Gemini

AWS launches Web Search on Amazon Bedrock AgentCore with tens of billions of documents, no external API required

AWS Releases AgentCore Harness for Production AI Agents with Two-API Setup

Google adds screen selection tool to Chrome's Gemini panel, integrates computer use into Gemini 3.5 Flash API

Comments