A startup founded by former Google engineers, Physical Intelligence, has unveiled the π0.7 model. The developers have claimed a “qualitative leap” in the AI’s ability to generalize skills and perform tasks it was not directly trained for.
Our newest model, π0.7, has some interesting emergent capabilities: it can control a new robot to fold shirts for which we had no shirt folding data, figure out how to use an appliance with language-based coaching, and perform a wide range of dexterous tasks all in one model! pic.twitter.com/s9NxKfb7pe
— Physical Intelligence (@physical_int) April 16, 2026
The system belongs to the “Vision-Language-Action” (VLA) class and is designed for robot control.
Unlike previous solutions, π0.7 has demonstrated signs of compositional generalization—the ability to combine previously learned skills to solve new tasks.
Untrained Tasks and Transfer Between Robots
During experiments, the model exhibited a range of unexpected abilities. Notably, π0.7 was able to control a new type of robot and fold t-shirts, despite the absence of training data for this specific platform.
Compositional generalization is a key capability of large models like LLMs, but it has been elusive in robotics. Another emergent ability we found is to control a new robot (UR5e) to fold t-shirts, even though we didn’t have any laundry folding data on this robot. pic.twitter.com/lAXYag002Z
— Physical Intelligence (@physical_int) April 16, 2026
The results are comparable to the level of operators with hundreds of hours of teleoperation experience, noted the programmers.
The tool also managed to understand the use of previously unfamiliar devices, including kitchen appliances. For instance, the robot completed part of a task involving cooking sweet potatoes in an air fryer, even though such scenarios were not in the training set.
According to the developers, this was made possible by combining disparate skills—similar to how language models combine knowledge from different domains.
Control Through Language and Context
One of the key differences of π0.7 is its ability to be controlled not only through “what to do” commands but also through “how to do” clarifications.
The model accepts:
- text instructions;
- metadata (such as speed and quality of execution);
- visual subgoals—images of the expected result of a step.
Some of the subgoals can be created by the auxiliary system during operation. This allows the robot to adjust its behavior without retraining.
π0.7 handles diverse prompts that don’t just say what to do, but also how to do it, including rich language and multimodal information, such as visual subgoal images. At test time, these images can be produced by a lightweight world model. pic.twitter.com/cbdovdVjBG
— Physical Intelligence (@physical_int) April 16, 2026
This approach allows for the integration of data from various sources—video, telemetry from robots, and autonomously collected episodes—into a unified learning system.
The First Step Towards ‘Universal’ Robots
Physical Intelligence noted that previously such models required retraining for each task—similar to early versions of language models. In contrast, π0.7 works “out of the box” and adapts to new scenarios through language.
The team emphasized that such a level of generalization has long been considered a strong point of LLM, but remained unattainable in robotics.
Despite the progress, the model still struggles with complex tasks without step-by-step prompts. However, with sequential instructions, the quality of execution significantly improves.
In the future, such instructions could help train more autonomous machines capable of acting without human intervention. Physical Intelligence believes that π0.7 shows the first signs of a transition to universal robots that adapt to new conditions without manual adjustment for each task.
Back in February, the company Carbon Robotics released the AI model Large Plant Model, which can recognize plant species to combat weeds.
