model releaseOpenAI

OpenAI releases GPT-Realtime-2 reasoning voice model with two specialized variants for translation and transcription

TL;DR

OpenAI has released three new realtime voice models through its Realtime API: GPT-Realtime-2 with GPT-5-class reasoning capabilities, GPT-Realtime-Translate supporting 70 input languages, and GPT-Realtime-Whisper for streaming transcription. The models are priced at $32-64 per 1M audio tokens for GPT-Realtime-2, and $0.017-0.034 per minute for the specialized variants.

May 7, 2026 · 6:06 PM1 min read

GPT-Realtime-2 — Quick Specs

Input$32/1M tokens

Output$64/1M tokens

Compare GPT-Realtime-2 with other models →

OpenAI releases GPT-Realtime-2 reasoning voice model with two specialized variants for translation and transcription

OpenAI has released three new realtime voice models through its Realtime API, with the flagship GPT-Realtime-2 incorporating what the company describes as "GPT-5-class reasoning" capabilities.

GPT-Realtime-2: Voice model with reasoning

GPT-Realtime-2 is designed for live voice interactions where the model maintains conversation flow while processing complex requests. According to OpenAI, the model can "reason through a request, call tools, handle corrections or interruptions, and respond in a way that fits the moment." The model is priced at $32 per 1M audio input tokens ($0.40 for cached input tokens) and $64 per 1M audio output tokens.

GPT-Realtime-Translate: Live speech translation

GPT-Realtime-Translate provides real-time speech translation from 70+ input languages into 13 output languages. The model is designed to maintain pace with the speaker during live translation. Pricing is set at $0.034 per minute.

GPT-Realtime-Whisper: Streaming transcription

GPT-Realtime-Whisper offers low-latency streaming transcription that processes speech as users speak. OpenAI positions this model for applications requiring live captions and real-time meeting notes. The model costs $0.017 per minute.

Availability and implementation

All three models are now available through OpenAI's Realtime API. Developers can test the models in OpenAI's Playground. The company has not disclosed specific benchmark scores, context window sizes, or training data cutoff dates for these models.

What this means

The release of GPT-Realtime-2 with claimed "GPT-5-class reasoning" represents OpenAI's first voice model with advanced reasoning capabilities, potentially enabling more sophisticated voice-based applications beyond simple command-and-response patterns. The specialized translation and transcription models address specific use cases with per-minute pricing that may be more predictable for developers building streaming applications. However, without published benchmarks or technical specifications, the actual performance improvements over existing voice models remain unclear.

Source: 9to5mac.com ↗

OpenAI voice AI realtime API GPT-5 speech translation transcription reasoning models

product updateJune 17, 2026

OpenAI launches scheduled tasks in ChatGPT, replacing Pulse feature in 14 days

OpenAI has launched scheduled tasks in ChatGPT, allowing users to automate reminders, recurring work, and monitoring. The feature is rolling out today to Plus, Pro, Business, and Enterprise users, and will replace the existing Pulse feature in 14 days.

model releaseJune 21, 2026

Poolside releases Laguna M.1: 225B parameter MoE model scores 74.6% on SWE-bench Verified

Poolside has released Laguna M.1, a 225B total parameter Mixture-of-Experts model with 23B activated parameters per token, designed for agentic coding tasks. The model scores 74.6% on SWE-bench Verified and 63.1% on SWE-bench Multilingual, released under Apache 2.0 license.

model releaseJune 19, 2026

US government forces Anthropic to pull Fable 5 and Mythos 5 models over guardrail bypass concerns

The US government forced Anthropic to withdraw its Fable 5 and Mythos 5 models, citing national security concerns after Amazon researchers allegedly discovered a method to bypass Fable 5's safety guardrails. Cybersecurity researchers have signed an open letter opposing the ban, with Anthropic noting similar vulnerabilities exist in competing models.

model releaseJune 18, 2026

Mistral Releases Voxtral TTS: 4B Parameter Text-to-Speech Model at $0.016 per 1k Characters

Mistral AI has released Voxtral TTS, a 4B parameter text-to-speech model supporting 9 languages including English, French, German, Spanish, Dutch, Portuguese, Italian, Hindi, and Arabic. The model achieves 70ms latency for typical inputs and can clone voices from as little as 3 seconds of audio, priced at $0.016 per 1,000 characters.

OpenAI releases GPT-Realtime-2 reasoning voice model with two specialized variants for translation and transcription

GPT-Realtime-2 — Quick Specs

OpenAI releases GPT-Realtime-2 reasoning voice model with two specialized variants for translation and transcription

GPT-Realtime-2: Voice model with reasoning

GPT-Realtime-Translate: Live speech translation

GPT-Realtime-Whisper: Streaming transcription

Availability and implementation

What this means

Related Articles

OpenAI launches scheduled tasks in ChatGPT, replacing Pulse feature in 14 days

Poolside releases Laguna M.1: 225B parameter MoE model scores 74.6% on SWE-bench Verified

US government forces Anthropic to pull Fable 5 and Mythos 5 models over guardrail bypass concerns

Mistral Releases Voxtral TTS: 4B Parameter Text-to-Speech Model at $0.016 per 1k Characters

Comments