Mira Murati's Thinking Machines demos real-time 'interaction models' that process audio, video, and text simultaneously
Thinking Machines, founded by former OpenAI CTO Mira Murati, announced it's developing 'interaction models' that continuously process audio, video, and text in real time. Unlike current models that wait for complete input before responding, these models aim to enable simultaneous perception and generation across modalities.
Mira Murati's Thinking Machines demos real-time 'interaction models'
Thinking Machines, the AI company founded by former OpenAI CTO Mira Murati, announced Monday it's developing "interaction models" that continuously process audio, video, and text while responding in real time.
The company describes current AI models as operating in a "single thread" where they must wait for complete user input before responding, with perception frozen during generation. According to Thinking Machines, this creates a "bandwidth bottleneck" that limits collaboration between humans and AI.
Interaction models aim to solve this by processing multiple modalities simultaneously. The company demonstrated examples including listening for animal mentions in stories, real-time speech translation, and posture detection alerting users when they slouch.
Technical approach
Thinking Machines frames the problem as fundamentally architectural. Current models experience "no perception of what the user is doing or how the user is doing it" until input is complete, and "perception freezes" during generation until the model finishes or is interrupted.
The company's approach enables models to "meet humans where they are, rather than forcing humans to contort themselves to AI interfaces," according to its announcement.
Availability
No technical specifications, benchmark scores, or pricing have been disclosed. Thinking Machines plans to open a "limited research preview" in the "coming months" with a wider release later in 2026. The models are not currently available for testing.
Company background
Murati founded Thinking Machines in February 2025 after departing OpenAI, where she served as CTO. The startup has experienced significant staff turnover, with key members leaving for Meta and returning to OpenAI.
What this means
Thinking Machines is attempting to solve a genuine limitation in current LLM architecture: the turn-based nature of inference. If successful, simultaneous multimodal processing could enable more natural human-AI interaction, particularly for collaborative tasks. However, without technical details on latency, compute requirements, or quality degradation during simultaneous processing, it's unclear whether the approach represents a fundamental architectural advance or an engineering optimization. The company's staff retention issues and lack of concrete deployment timeline suggest early-stage development.
Related Articles
Google adds screen selection tool to Chrome's Gemini panel, integrates computer use into Gemini 3.5 Flash API
Google has added a screen selection tool to Chrome 149's Gemini panel that allows users to capture text or images from their current tab for prompts. Separately, the company integrated computer use capabilities directly into the Gemini 3.5 Flash model API, replacing the standalone Gemini 2.5 Computer Use model.
Apple adds Google Gemini to Xcode 26.6 as third coding assistant option alongside Claude and OpenAI Codex
Apple released Xcode 26.6 on June 25, 2026, adding Google Gemini as a third AI coding assistant option for developers. The IDE now supports Gemini alongside Anthropic Claude Agent and OpenAI Codex, plus compatibility with other agents through the Agent Client Protocol.
Cline CLI v3.0.30 Adds Token Counter, SAP AI Core Support, and OpenRouter Improvements
Cline shipped CLI v3.0.30 on June 26, 2024, adding a token count display in the status bar alongside cost tracking. The update integrates SAP AI Core as a provider, refreshes the model catalog with latest provider models, and fixes OpenRouter prompt caching behavior.
GitHub benchmarks Copilot's agentic framework across 20+ models, reports leading token efficiency
GitHub has published benchmark results for its Copilot agentic harness, evaluating performance across multiple tasks and over 20 different models. The company claims the framework achieves leading token efficiency while maintaining flexibility in model selection.
Comments
Loading...