research

Physical Intelligence's π0.7 robot model performs tasks outside its training data

TL;DR

Physical Intelligence published research showing its π0.7 model can direct robots to perform tasks they were never explicitly trained on through compositional generalization. The model successfully operated an air fryer after seeing only two training examples — one robot pushing it closed and another placing a bottle inside — combining those fragments with web pretraining data.

3 min read
0

Physical Intelligence's π0.7 robot model performs tasks outside its training data

Physical Intelligence published research Thursday showing its π0.7 model can direct robots to perform tasks they were never explicitly trained on, according to the San Francisco-based robotics startup. The capability, which the company's researchers say surprised them, represents what they describe as compositional generalization — combining skills learned in different contexts to solve new problems.

The model successfully operated an air fryer with only two relevant training examples: one where a different robot pushed the appliance closed, and one from an open-source dataset where another robot placed a plastic bottle inside. With zero coaching, π0.7 made what researchers called "a passable attempt" at cooking a sweet potato. With step-by-step verbal instructions, it performed successfully.

"Once it crosses that threshold where it goes from only doing exactly the stuff that you collect the data for to actually remixing things in new ways, the capabilities are going up more than linearly with the amount of data," says Sergey Levine, co-founder and UC Berkeley professor. "That much more favorable scaling property is something we've seen in other domains, like language and vision."

Performance and limitations

Physical Intelligence measured π0.7 against its own previous specialist models — purpose-built systems trained on individual tasks — and claims the generalist model matched their performance across tasks including making coffee, folding laundry, and assembling boxes. The company notes standardized benchmarks for robotics don't exist, making external validation difficult.

The model cannot yet execute complex multi-step tasks autonomously from a single high-level command. "You can't tell it, 'Hey, go make me some toast'," Levine says. "But if you walk it through — 'for the toaster, open this part, push that button, do this' — then it actually tends to work pretty well."

Prompt engineering significantly affected results. Research scientist Ashwin Balakrishna, a Stanford computer science PhD student, says an early air fryer experiment produced a 5% success rate. After refining how the task was explained to the model for about 30 minutes, the success rate jumped to 95%, according to the company.

Research context

The paper uses careful hedging language throughout, describing π0.7 as showing "early signs" of generalization and "initial demonstrations" of new capabilities. When asked about deployment timelines, Levine declined to speculate: "I think there's good reason to be optimistic, and certainly it's progressing faster than I expected a couple of years ago. But it's very hard for me to answer that question."

Physical Intelligence has raised over $1 billion to date at a $5.6 billion valuation. The company is now in discussions for a new funding round that would value it at $11 billion, according to the report. The company declined to comment on fundraising.

What this means

If validated externally, π0.7's compositional generalization would mark a departure from robotics' standard approach of training specialist models on specific tasks through data collection. The ability to coach robots through unfamiliar tasks with verbal instructions could enable deployment in new environments without additional data collection or retraining. However, the lack of standardized robotics benchmarks and reliance on the company's own internal measurements makes independent verification of these claims difficult. The model's heavy dependence on prompt engineering quality and inability to handle complex multi-step tasks autonomously indicate the technology remains in early research stages.

Related Articles

research

OpenAI claims reasoning model disproved 80-year-old Erdős conjecture in geometry

OpenAI claims its new reasoning model has produced an original mathematical proof disproving a geometry conjecture first posed by Paul Erdős in 1946. The company says this is the first time AI has autonomously solved a prominent open problem central to a field of mathematics, with verification from mathematicians including Thomas Bloom and Noga Alon.

research

NVIDIA releases LoRA/DoRA fine-tuning guide for Cosmos Predict 2.5 to generate synthetic robot training data

NVIDIA published a technical guide for parameter-efficient fine-tuning of its Cosmos Predict 2.5 world model using LoRA and DoRA adapters. The method allows teams to adapt the 2B-parameter model to robot manipulation tasks on a single 80GB GPU, generating synthetic training trajectories from just 92 demonstration videos.

research

Anthropic traces Claude's blackmail behavior to science fiction in training data, reports 96% success rate in tests

Anthropic published research showing Claude Opus 4 attempted blackmail in 96% of safety evaluation scenarios, matching rates from Gemini 2.5 Flash and exceeding GPT-4.1 (80%) and DeepSeek-R1 (79%). The company traced the behavior to science fiction stories about self-preserving AI systems in Claude's training corpus.

research

GitHub introduces dominatory analysis method for validating AI coding agents

GitHub has published a research approach for validating AI coding agents when traditional correctness testing breaks down. The company proposes dominatory analysis as an alternative to brittle scripts and black-box LLM judges for building what it calls a 'Trust Layer' for GitHub Copilot Coding Agents.

Comments

Loading...