Mira Murati's Thinking Machines demos real-time 'interaction models' that process audio, video, and text simultaneously
Thinking Machines, founded by former OpenAI CTO Mira Murati, announced it's developing 'interaction models' that continuously process audio, video, and text in real time. Unlike current models that wait for complete input before responding, these models aim to enable simultaneous perception and generation across modalities.
Mira Murati's Thinking Machines demos real-time 'interaction models'
Thinking Machines, the AI company founded by former OpenAI CTO Mira Murati, announced Monday it's developing "interaction models" that continuously process audio, video, and text while responding in real time.
The company describes current AI models as operating in a "single thread" where they must wait for complete user input before responding, with perception frozen during generation. According to Thinking Machines, this creates a "bandwidth bottleneck" that limits collaboration between humans and AI.
Interaction models aim to solve this by processing multiple modalities simultaneously. The company demonstrated examples including listening for animal mentions in stories, real-time speech translation, and posture detection alerting users when they slouch.
Technical approach
Thinking Machines frames the problem as fundamentally architectural. Current models experience "no perception of what the user is doing or how the user is doing it" until input is complete, and "perception freezes" during generation until the model finishes or is interrupted.
The company's approach enables models to "meet humans where they are, rather than forcing humans to contort themselves to AI interfaces," according to its announcement.
Availability
No technical specifications, benchmark scores, or pricing have been disclosed. Thinking Machines plans to open a "limited research preview" in the "coming months" with a wider release later in 2026. The models are not currently available for testing.
Company background
Murati founded Thinking Machines in February 2025 after departing OpenAI, where she served as CTO. The startup has experienced significant staff turnover, with key members leaving for Meta and returning to OpenAI.
What this means
Thinking Machines is attempting to solve a genuine limitation in current LLM architecture: the turn-based nature of inference. If successful, simultaneous multimodal processing could enable more natural human-AI interaction, particularly for collaborative tasks. However, without technical details on latency, compute requirements, or quality degradation during simultaneous processing, it's unclear whether the approach represents a fundamental architectural advance or an engineering optimization. The company's staff retention issues and lack of concrete deployment timeline suggest early-stage development.
Related Articles
Google adds Circle to Search functionality to Gemini overlay on Android
Google is rolling out an update to the Gemini overlay on Android that adds a circle selection tool for precise screen content queries. The feature, available in Google app version 17.20, lets users circle any on-screen element and include it as an image in their Gemini prompt.
AWS adds multimodal embeddings to Amazon Bedrock for manufacturing document retrieval
AWS released multimodal embedding capabilities for Amazon Nova on Bedrock, allowing manufacturing organizations to retrieve information from technical documents that combine text, engineering diagrams, and images. The model supports configurable dimensions from 256 to 3072 and processes text, images, and multi-page documents into a shared vector space.
OpenAI launches Daybreak security initiative with GPT-5.5-Cyber and Codex Security agent
OpenAI has launched Daybreak, a security-focused AI initiative that uses the Codex Security agent and new GPT-5.5-Cyber models to automatically detect and patch software vulnerabilities. The release follows Anthropic's Claude Mythos announcement by one month.
Google Home update accelerates Gemini voice commands, enables voice-based 'Ask Home' queries
Google has deployed a new update to Google Home that accelerates Gemini voice command processing, particularly for timers and alarms. The update extends Gemini's 'Ask Home' feature to voice commands, allowing users to query camera history and family member locations via smart speakers and displays.
Comments
Loading...