How Researchers From Carnegie Mellon University are Training Robots to do Household Chores
Category Engineering Thursday - June 22 2023, 06:47 UTC - 1 year ago Carnegie Mellon University have developed Visual-Robotics Bridge (VRB), a model that enables robots to learn by watching videos of human performing everyday tasks in their homes. When tested, the robots were successful in replicating 12 everyday tasks such as picking up a phone, opening a drawer, etc. The model works on affordance, a concept employed by designers to make a product user-friendly and intuitive. In the future, the research team hopes to make VRB more accurate and reliable.
Are you among those who often dream of a day when a robot will do all the everyday household chores for you? A team of researchers from Carnegie Mellon University (CMU) has figured out how to turn your dream into reality.
In their latest study, they proposed a model that allowed them to train robots to do household tasks by showing them videos of people doing ordinary activities in their homes, like picking up the phone, opening a drawer, etc.
So far, scientists have been training robots by physically showing them how a task is done or training them for weeks in a simulated environment. Both these methods take a lot of time and resources and often fail.
The CMU team claims that their proposed model, Visual-Robotics Bridge (VRB), how can make a robot learn a task in just 25 minutes, and that too without involving any humans or simulated environment.
This work could drastically improve the way robots are trained and "could enable robots to learn from the vast amount of internet and YouTube videos available," said Shikhar Bahl, one of the study authors and a Ph.D. student at CMU’s School of Computer Science.
Robots have learned to watch and learn .
VRB is an advanced version of WHIRL (In-the-Wild Human Imitating Robot Learning), a model that researchers used previously to train robots.
The difference between WHIRL and VRB is that the former requires a human to perform a task in front of a robot in a particular environment. After watching the human, the robot could perform the task in the same environment.
However, in VRB, no human is required, and with some practice, a trainee robot can mimic human operations even in a setting different from that shown in the video.The model works on affordance, a concept that explains the possibility of an action on an object. Designers employ affordance to make a product user-friendly and intuitive.
"For VRB, affordances define where and how a robot might interact with an object based on human behavior. For example, as a robot watches a human open a drawer, it identifies the contact points — the handle — and the direction of the drawer's movement — straight out from the starting location. After watching several videos of humans opening drawers, the robot can determine how to open any drawer," the researchers note.
During their study, the researchers first made the robots watch some videos from large video data sets such as Ego4d and Epic Kitchen. These extensive data have been developed to train AI programs to learn human actions.
Then they used affordance to make the robots understand the contact points and steps that make an action complete, and finally, they tested two robot platforms in multiple real-world settings for 200 hours.
Both robots successfully performed 12 tasks that humans perform almost daily in their homes, such as opening a can of soup, picking up a phone, lifting a lid, opening a door, pulling out a drawer, etc.
The CMU team wrote in their paper, "Vision-Robotics Bridge (VRB) is a scalable approach for learning useful affordances from passive human video data and deploying them on many different robot learning paradigms." .
In the future, they hope the model can become more accurate and reliable.
Share