AWS releases healthcare appointment agent tutorial using Nova 2 Sonic speech-to-speech model
AWS published a technical guide for building voice appointment agents using Amazon Nova 2 Sonic, a speech-to-speech model that processes audio natively without separate transcription steps. The tutorial covers authentication, scheduling, and escalation tools running on Amazon Bedrock AgentCore with DynamoDB persistence.
AWS releases healthcare appointment agent tutorial using Nova 2 Sonic speech-to-speech model
AWS published a technical guide for building voice appointment agents using Amazon Nova 2 Sonic, a speech-to-speech model available on Amazon Bedrock that processes audio natively without chaining separate transcription, reasoning, and text-to-speech services.
The tutorial addresses healthcare appointment no-shows, which range from 5–30 percent across US healthcare providers depending on specialty. The agent authenticates patients by voice, manages appointments (confirm, cancel, reschedule), collects pre-visit health information, and escalates to human staff when needed.
Technical architecture
The implementation uses Amazon Bedrock AgentCore, a serverless runtime for AI agents with IAM-authenticated WebSocket endpoints. The agent is built with the Strands Agents SDK's BidiAgent class for bidirectional voice streaming.
The system architecture includes:
- React frontend capturing browser microphone audio over SigV4-signed WebSocket
- Amazon Cognito for authentication with temporary AWS credentials
- Amazon Bedrock AgentCore hosting the containerized Strands BidiAgent
- Nova 2 Sonic processing speech input and generating audio responses
- Three DynamoDB tables storing patient records, appointments, and time slots
- Amazon SNS for escalation notifications
According to AWS, Nova 2 Sonic's native speech-to-speech processing retains vocal nuance like tone and hesitation that text transcription discards. The model supports 16kHz input and output sample rates with multilingual capabilities and accent handling.
Seven healthcare tools
The agent implements seven Python functions using the Strands SDK's @tool decorator:
- authenticate_patient: Verifies identity using name and last four SSN digits via DynamoDB Global Secondary Index, enforcing three-attempt limit
- confirm_appointment: Updates appointment status from Scheduled/Rescheduled to Confirmed with idempotent checks
- cancel_appointment: Changes status to Canceled with timestamped reason
- find_available_slots: Queries open time slots from provider schedule
- book_appointment_slot: Finalizes rescheduling after slot selection
- collect_health_info: Gathers pre-visit patient information
- escalate_to_human: Triggers Amazon SNS notification for human staff intervention
Nova 2 Sonic decides when to invoke each tool based on patient responses. The setup requires minimal code:
from strands.experimental.bidi import BidiAgent
from strands.experimental.bidi.models import BidiNovaSonicModel
model = BidiNovaSonicModel(
region="us-east-1",
model_id="amazon.nova-2-sonic-v1:0",
provider_config={
"audio": {
"input_sample_rate": 16000,
"output_sample_rate": 16000,
"voice": "tiffany",
}
},
tools=tools,
)
agent = BidiAgent(
model=model,
tools=tools,
system_prompt=system_prompt,
)
Deployment details
The tutorial includes a browser-based interface for testing. To connect the agent to actual phone lines for outbound dialing, AWS notes that integration with a telephony service like Amazon Connect Customer is required.
The serverless deployment runs entirely on AWS infrastructure with Amazon Cognito authentication, DynamoDB persistence, and SNS notifications. The Strands SDK handles bidirectional streaming complexity, managing audio input, tool invocation, and audio response generation.
What this means
This tutorial demonstrates AWS's push to move speech AI workloads away from chained transcription-LLM-TTS architectures toward native speech-to-speech models. By preserving acoustic context through the full conversation, Nova 2 Sonic can potentially detect patient anxiety or confusion that text transcription loses.
The healthcare use case targets a specific economic problem: appointment no-shows cost providers revenue and delay patient care. Voice agents that handle routine confirmation calls at scale could reduce manual phone bank operations, though AWS provides no data on actual no-show rate reduction from this approach.
The modular tool design means developers can add capabilities by writing individual Python functions without rebuilding the agent. For organizations already using AWS infrastructure, the serverless deployment on Bedrock AgentCore eliminates separate infrastructure management.
Related Articles
Loka Achieves 87% Speech Reasoning Accuracy Using Amazon Nova 2 Sonic, Outperforming GPT Realtime and Gemini
Loka built a conversational voice agent using Amazon Nova 2 Sonic that achieved 87.0% speech reasoning accuracy on Big Bench Audio, surpassing GPT Realtime at 83.0% and Gemini 2.5 Flash Native Audio at 71.0%. The system delivers Time to First Audio of 1.39 seconds at approximately $0.27 per hour of input audio.
AWS launches Web Search on Amazon Bedrock AgentCore with tens of billions of documents, no external API required
Amazon Web Services launched Web Search on Amazon Bedrock AgentCore, a fully managed web search capability that gives AI agents access to tens of billions of documents without requiring external search APIs. The service, now generally available, runs entirely within AWS infrastructure and refreshes its index within minutes of new content appearing online.
AWS Releases AgentCore Harness for Production AI Agents with Two-API Setup
Amazon Web Services made its AgentCore harness generally available, reducing production AI agent deployment to two API calls: CreateHarness and InvokeHarness. The managed service handles sandboxed execution, memory, tool integration, and observability, eliminating infrastructure setup for teams building LLM agents.
Google adds screen selection tool to Chrome's Gemini panel, integrates computer use into Gemini 3.5 Flash API
Google has added a screen selection tool to Chrome 149's Gemini panel that allows users to capture text or images from their current tab for prompts. Separately, the company integrated computer use capabilities directly into the Gemini 3.5 Flash model API, replacing the standalone Gemini 2.5 Computer Use model.
Comments
Loading...