ByteDance, the company behind TikTok, has introduced a system that acts as a “brain” for robots, enabling them to perform household tasks such as hanging clothes or clearing tables.
GR-3 is a large vision-language-action model that allows robots to follow natural language commands and perform general tasks with unfamiliar objects. They can operate in new environments or with abstract concepts related to size and spatial relationships.
A video published on the website demonstrates how ByteMini, a lab-based two-armed robot, can insert a hanger into a shirt and place it on a rack.
🚀🚀🚀 Ever wondered what it takes for robots to handle real-world household tasks? long-horizon execution, deformable object dexterity, and unseen object generalization — meet GR-3, ByteDance Seed’s new Vision-Language-Action (VLA) model!
GR-3 is a generalizable… pic.twitter.com/zECRjaXC0J
— Xiao Ma (@yusufma555) July 22, 2025
In a separate technical report, the team noted that the robot can handle short-sleeved clothing, even though “all items in the training data had long sleeves.”
Thanks to GR-3, the robot can execute commands to select a specific item from several and place it in a designated location.
The system can recognize an object not only by name but also by size (e.g., “large plate”) or spatial attribute (e.g., “on the left”). It can fully execute the task of “clearing the dining table” with a single command.
To train the model, ByteDance employed a multi-component approach, including:
- joint training on large datasets in the “image-text” format;
- fine-tuning on human action trajectory data collected via VR devices;
- imitation learning on android movement data.
“We hope that GR-3 will be a step towards creating universal robots capable of assisting people in everyday life,” the team stated.
Back in January, the startup Perplexity AI announced its intention to acquire the American TikTok. The firm sent ByteDance a proposal to merge Perplexity, TikTok U.S., and new equity partners into a single legal entity.
