Mira Murati's Thinking Machines announces full-duplex AI model with 0.40-second response time
Thinking Machines Lab, founded by former OpenAI CTO Mira Murati, announced TML-Interaction-Small, a full-duplex AI model that processes input while generating responses simultaneously. The company claims 0.40-second response time, matching natural human conversation speed.
Mira Murati's Thinking Machines announces full-duplex AI model with 0.40-second response time
Thinking Machines Lab, the AI startup founded by former OpenAI CTO Mira Murati, announced what it calls "interaction models" — AI systems that process input and generate responses simultaneously rather than in traditional turn-taking fashion.
The company's first model, TML-Interaction-Small, claims a 0.40-second response time, which Thinking Machines says matches natural human conversation speed and is "significantly faster than comparable models from OpenAI and Google." The technical architecture enables full-duplex communication, meaning the model can listen while it speaks, similar to a phone call rather than a text message exchange.
Release timeline and availability
This is a research preview, not a public product. Thinking Machines plans a "limited research preview" within the next few months, with wider release scheduled for later in 2026. No pricing information has been disclosed.
The company has not released benchmark scores, parameter count, context window specifications, or other technical details beyond the response latency claim.
Technical approach
Current AI models follow a strict turn-taking protocol: user input is processed completely before response generation begins. Thinking Machines' approach processes incoming audio or text while simultaneously generating output, which the company argues should be "native to a model, not bolted on."
The distinction matters for applications requiring real-time interaction, such as voice assistants or conversational interfaces where interruption and natural flow are expected.
What this means
Full-duplex communication represents a meaningful architectural shift if the claimed performance holds under real-world conditions. A 0.40-second response time would indeed approach human conversation norms (typical human response latency ranges from 0.2 to 0.6 seconds). However, without independent verification, public testing, or detailed benchmarks, it's impossible to assess whether this translates to genuinely better user experience or represents a marginal improvement over existing streaming architectures. The value proposition depends entirely on whether simultaneous processing creates noticeably more natural interactions than current streaming implementations, which remains unproven until researchers and users can test the system directly.
Related Articles
OpenAI releases GPT-Realtime-2 reasoning voice model with two specialized variants for translation and transcription
OpenAI has released three new realtime voice models through its Realtime API: GPT-Realtime-2 with GPT-5-class reasoning capabilities, GPT-Realtime-Translate supporting 70 input languages, and GPT-Realtime-Whisper for streaming transcription. The models are priced at $32-64 per 1M audio tokens for GPT-Realtime-2, and $0.017-0.034 per minute for the specialized variants.
Arcee AI Releases Trinity Large Thinking: Free 262K Context Reasoning Model
Arcee AI has released Trinity Large Thinking, an open source reasoning model with a 262,144-token context window. The model is available free via OpenRouter and claims strong performance in PinchBench, agentic workloads, and reasoning tasks.
OpenAI offers EU preview access to GPT-5.5-Cyber model while Anthropic withholds Mythos
OpenAI announced GPT-5.5-Cyber is rolling out in limited preview to vetted cybersecurity teams and is in discussions with the European Commission about preview access. Anthropic released its Mythos model a month ago but has yet to grant EU access for security review.
Supertone releases Supertonic 3: 99M-parameter on-device TTS model supporting 31 languages
Supertone has released Supertonic 3, a 99M-parameter text-to-speech model that runs entirely on-device using ONNX Runtime. The model expands language support from 5 to 31 languages compared to Supertonic 2, requires no GPU, and claims competitive accuracy against models 7-20x larger.
Comments
Loading...