model release

Mira Murati's Thinking Machines announces full-duplex AI model with 0.40-second response time

TL;DR

Thinking Machines Lab, founded by former OpenAI CTO Mira Murati, announced TML-Interaction-Small, a full-duplex AI model that processes input while generating responses simultaneously. The company claims 0.40-second response time, matching natural human conversation speed.

May 12, 2026 · 5:05 AM2 min read

Mira Murati's Thinking Machines announces full-duplex AI model with 0.40-second response time

Thinking Machines Lab, the AI startup founded by former OpenAI CTO Mira Murati, announced what it calls "interaction models" — AI systems that process input and generate responses simultaneously rather than in traditional turn-taking fashion.

The company's first model, TML-Interaction-Small, claims a 0.40-second response time, which Thinking Machines says matches natural human conversation speed and is "significantly faster than comparable models from OpenAI and Google." The technical architecture enables full-duplex communication, meaning the model can listen while it speaks, similar to a phone call rather than a text message exchange.

Release timeline and availability

This is a research preview, not a public product. Thinking Machines plans a "limited research preview" within the next few months, with wider release scheduled for later in 2026. No pricing information has been disclosed.

The company has not released benchmark scores, parameter count, context window specifications, or other technical details beyond the response latency claim.

Technical approach

Current AI models follow a strict turn-taking protocol: user input is processed completely before response generation begins. Thinking Machines' approach processes incoming audio or text while simultaneously generating output, which the company argues should be "native to a model, not bolted on."

The distinction matters for applications requiring real-time interaction, such as voice assistants or conversational interfaces where interruption and natural flow are expected.

What this means

Full-duplex communication represents a meaningful architectural shift if the claimed performance holds under real-world conditions. A 0.40-second response time would indeed approach human conversation norms (typical human response latency ranges from 0.2 to 0.6 seconds). However, without independent verification, public testing, or detailed benchmarks, it's impossible to assess whether this translates to genuinely better user experience or represents a marginal improvement over existing streaming architectures. The value proposition depends entirely on whether simultaneous processing creates noticeably more natural interactions than current streaming implementations, which remains unproven until researchers and users can test the system directly.

Source: techcrunch.com ↗

Thinking Machines Lab Mira Murati full-duplex AI interaction models voice AI conversational AI TML-Interaction-Small

model releaseMay 7, 2026

OpenAI releases GPT-Realtime-2 reasoning voice model with two specialized variants for translation and transcription

OpenAI has released three new realtime voice models through its Realtime API: GPT-Realtime-2 with GPT-5-class reasoning capabilities, GPT-Realtime-Translate supporting 70 input languages, and GPT-Realtime-Whisper for streaming transcription. The models are priced at $32-64 per 1M audio tokens for GPT-Realtime-2, and $0.017-0.034 per minute for the specialized variants.

model releaseMay 11, 2026

Arcee AI Releases Trinity Large Thinking: Free 262K Context Reasoning Model

Arcee AI has released Trinity Large Thinking, an open source reasoning model with a 262,144-token context window. The model is available free via OpenRouter and claims strong performance in PinchBench, agentic workloads, and reasoning tasks.

model releaseMay 11, 2026

OpenAI offers EU preview access to GPT-5.5-Cyber model while Anthropic withholds Mythos

OpenAI announced GPT-5.5-Cyber is rolling out in limited preview to vetted cybersecurity teams and is in discussions with the European Commission about preview access. Anthropic released its Mythos model a month ago but has yet to grant EU access for security review.

model releaseMay 10, 2026

Supertone releases Supertonic 3: 99M-parameter on-device TTS model supporting 31 languages

Supertone has released Supertonic 3, a 99M-parameter text-to-speech model that runs entirely on-device using ONNX Runtime. The model expands language support from 5 to 31 languages compared to Supertonic 2, requires no GPU, and claims competitive accuracy against models 7-20x larger.

Mira Murati's Thinking Machines announces full-duplex AI model with 0.40-second response time

Mira Murati's Thinking Machines announces full-duplex AI model with 0.40-second response time

Release timeline and availability

Technical approach

What this means

Related Articles

OpenAI releases GPT-Realtime-2 reasoning voice model with two specialized variants for translation and transcription

Arcee AI Releases Trinity Large Thinking: Free 262K Context Reasoning Model

OpenAI offers EU preview access to GPT-5.5-Cyber model while Anthropic withholds Mythos

Supertone releases Supertonic 3: 99M-parameter on-device TTS model supporting 31 languages

Comments