Amazon Polly adds bidirectional streaming API for real-time speech synthesis in conversational AI

TL;DR

Amazon has released a new Bidirectional Streaming API for Amazon Polly that enables simultaneous text input and audio output over a single HTTP/2 connection. The API reduces end-to-end latency by 39% compared to traditional request-response TTS by allowing text to be sent word-by-word as LLMs generate tokens, rather than waiting for complete sentences. The feature is available in Java, JavaScript, .NET, C++, Go, Kotlin, PHP, Ruby, Rust, and Swift SDKs.

March 26, 2026 · 5:20 PM2 min read

Amazon Polly Adds Real-Time Bidirectional Streaming for Conversational AI

Amazon has released a new Bidirectional Streaming API for Amazon Polly that enables real-time text-to-speech synthesis where text and audio flow simultaneously over a single connection.

The Problem with Traditional TTS

Conventional text-to-speech APIs operate in request-response mode: developers must collect the complete text before making a synthesis request. For conversational AI applications powered by large language models (LLMs)—which generate text token-by-token—this creates a bottleneck. Users must wait for:

The LLM to finish generating the complete response
The TTS service to synthesize the entire text
Audio to download before playback begins

Amazon Polly previously supported streaming audio output, but required complete input text upfront.

How Bidirectional Streaming Works

The new StartSpeechSynthesisStream API introduces true duplex communication:

Send text incrementally: Stream text to Amazon Polly as it becomes available, word-by-word
Receive audio immediately: Get synthesized audio bytes back in real-time as they're generated
Control timing: Use flush configuration to trigger synthesis of buffered text
Single connection: HTTP/2 enables simultaneous bidirectional flow

Key components include TextEvent (client → service), CloseStreamEvent (client → service), AudioEvent (service → client), and StreamClosedEvent (service → client).

Performance Improvements

Amazon benchmarked the bidirectional API against the traditional SynthesizeSpeech API using identical test conditions: 7,045 characters of prose (970 words) with the Matthew voice, Generative engine, MP3 output at 24kHz.

Simulation conditions: LLM generating tokens at ~30ms per word.

Metric	Traditional API	Bidirectional	Improvement
Total processing time	115,226 ms	70,071 ms	39% faster
API calls	27	1	27x reduction
Total audio bytes	2,354,292	2,324,636	Similar

The traditional API buffers words until sentence boundaries are reached, then sends complete sentences as separate requests and waits for full audio responses. The bidirectional API sends each word as it arrives, allowing Amazon Polly to begin synthesis immediately.

Technical Details

The bidirectional streaming API eliminates the need for application-level text separation logic and complex audio reassembly that previously required multiple parallel API calls.

Supported SDKs include:

AWS SDK for Java 2.x, JavaScript v3, .NET v4
C++, Go v2, Kotlin, PHP v3, Ruby v3, Rust, Swift

Not currently supported: Python, .NET v3, AWS CLI v1/v2, and PowerShell.

Developers can use a reactive streams Publisher to send TextEvent objects containing text, and handle incoming AudioEvent objects through a visitor pattern response handler.

What This Means

The bidirectional streaming API significantly reduces end-to-end latency for conversational AI by eliminating the architectural bottleneck of waiting for complete text before synthesis begins. The 39% latency reduction and 27x reduction in API calls represents meaningful improvement for real-time applications like virtual assistants and interactive chatbots. The feature trades API simplicity—developers previously using sentence buffering workarounds will appreciate the native solution—for measurable performance gains. Availability is limited to specific SDK languages, which may slow enterprise adoption initially.

Source: aws.amazon.com ↗

amazon-polly text-to-speech tts conversational-ai streaming http2 aws product-update

product updateJune 26, 2026

Google integrates Gemini AI into Play Store for conversational app discovery and in-app purchases

Google has rolled out Gemini integration with the Play Store on Android, allowing users to discover and install apps through conversational queries. The feature also enables purchasing in-app items and gift cards through chat, with support expanding to more apps over time.

product updateJune 26, 2026

Cline CLI v3.0.30 Adds Token Counter, SAP AI Core Support, and OpenRouter Improvements

Cline shipped CLI v3.0.30 on June 26, 2024, adding a token count display in the status bar alongside cost tracking. The update integrates SAP AI Core as a provider, refreshes the model catalog with latest provider models, and fixes OpenRouter prompt caching behavior.

product updateJune 22, 2026

Google expands Gemini Android overlay menu with six new tools accessible without opening app

Google has expanded the Gemini overlay plus menu on Android to include six tools: Videos, Music, Canvas, and Guided Learning join the existing Images and Personal Intelligence options. The update, rolling out in Google app version 17.32, allows users to access most Gemini features from anywhere on Android without opening the full app.

product updateJune 27, 2026

US government authorizes Anthropic to restore Mythos 5 cybersecurity model to 100+ institutions

The US government has authorized Anthropic to redeploy its Mythos 5 cybersecurity AI model to more than 100 US institutions, including major corporations and government agencies, following a two-week suspension. Commerce Secretary Howard Lutnick approved the redeployment after Anthropic implemented safeguards and committed to work with the government on release protocols.