product update

Mira Murati's Thinking Machines demos real-time 'interaction models' that process audio, video, and text simultaneously

TL;DR

Thinking Machines, founded by former OpenAI CTO Mira Murati, announced it's developing 'interaction models' that continuously process audio, video, and text in real time. Unlike current models that wait for complete input before responding, these models aim to enable simultaneous perception and generation across modalities.

May 11, 2026 · 10:35 PM2 min read

Mira Murati's Thinking Machines demos real-time 'interaction models'

Thinking Machines, the AI company founded by former OpenAI CTO Mira Murati, announced Monday it's developing "interaction models" that continuously process audio, video, and text while responding in real time.

The company describes current AI models as operating in a "single thread" where they must wait for complete user input before responding, with perception frozen during generation. According to Thinking Machines, this creates a "bandwidth bottleneck" that limits collaboration between humans and AI.

Interaction models aim to solve this by processing multiple modalities simultaneously. The company demonstrated examples including listening for animal mentions in stories, real-time speech translation, and posture detection alerting users when they slouch.

Technical approach

Thinking Machines frames the problem as fundamentally architectural. Current models experience "no perception of what the user is doing or how the user is doing it" until input is complete, and "perception freezes" during generation until the model finishes or is interrupted.

The company's approach enables models to "meet humans where they are, rather than forcing humans to contort themselves to AI interfaces," according to its announcement.

Availability

No technical specifications, benchmark scores, or pricing have been disclosed. Thinking Machines plans to open a "limited research preview" in the "coming months" with a wider release later in 2026. The models are not currently available for testing.

Company background

Murati founded Thinking Machines in February 2025 after departing OpenAI, where she served as CTO. The startup has experienced significant staff turnover, with key members leaving for Meta and returning to OpenAI.

What this means

Thinking Machines is attempting to solve a genuine limitation in current LLM architecture: the turn-based nature of inference. If successful, simultaneous multimodal processing could enable more natural human-AI interaction, particularly for collaborative tasks. However, without technical details on latency, compute requirements, or quality degradation during simultaneous processing, it's unclear whether the approach represents a fundamental architectural advance or an engineering optimization. The company's staff retention issues and lack of concrete deployment timeline suggest early-stage development.

Source: theverge.com ↗

thinking-machines mira-murati multimodal real-time-ai interaction-models startup

product updateMay 11, 2026

Google adds Circle to Search functionality to Gemini overlay on Android

Google is rolling out an update to the Gemini overlay on Android that adds a circle selection tool for precise screen content queries. The feature, available in Google app version 17.20, lets users circle any on-screen element and include it as an image in their Gemini prompt.

product updateMay 11, 2026

AWS adds multimodal embeddings to Amazon Bedrock for manufacturing document retrieval

AWS released multimodal embedding capabilities for Amazon Nova on Bedrock, allowing manufacturing organizations to retrieve information from technical documents that combine text, engineering diagrams, and images. The model supports configurable dimensions from 256 to 3072 and processes text, images, and multi-page documents into a shared vector space.

product updateMay 11, 2026

OpenAI launches Daybreak security initiative with GPT-5.5-Cyber and Codex Security agent

OpenAI has launched Daybreak, a security-focused AI initiative that uses the Codex Security agent and new GPT-5.5-Cyber models to automatically detect and patch software vulnerabilities. The release follows Anthropic's Claude Mythos announcement by one month.

product updateMay 11, 2026

Google Home update accelerates Gemini voice commands, enables voice-based 'Ask Home' queries

Google has deployed a new update to Google Home that accelerates Gemini voice command processing, particularly for timers and alarms. The update extends Gemini's 'Ask Home' feature to voice commands, allowing users to query camera history and family member locations via smart speakers and displays.

Mira Murati's Thinking Machines demos real-time 'interaction models' that process audio, video, and text simultaneously

Mira Murati's Thinking Machines demos real-time 'interaction models'

Technical approach

Availability

Company background

What this means

Related Articles

Google adds Circle to Search functionality to Gemini overlay on Android

AWS adds multimodal embeddings to Amazon Bedrock for manufacturing document retrieval

OpenAI launches Daybreak security initiative with GPT-5.5-Cyber and Codex Security agent

Google Home update accelerates Gemini voice commands, enables voice-based 'Ask Home' queries

Comments