How Researchers From Carnegie Mellon University are Training Robots to do Household Chores

Category Engineering

tldr #

Carnegie Mellon University have developed Visual-Robotics Bridge (VRB), a model that enables robots to learn by watching videos of human performing everyday tasks in their homes. When tested, the robots were successful in replicating 12 everyday tasks such as picking up a phone, opening a drawer, etc. The model works on affordance, a concept employed by designers to make a product user-friendly and intuitive. In the future, the research team hopes to make VRB more accurate and reliable.


content #

Are you among those who often dream of a day when a robot will do all the everyday household chores for you? A team of researchers from Carnegie Mellon University (CMU) has figured out how to turn your dream into reality.

In their latest study, they proposed a model that allowed them to train robots to do household tasks by showing them videos of people doing ordinary activities in their homes, like picking up the phone, opening a drawer, etc.

The past robots were trained either by physical showing of the task or for weeks in a simulated environment.

So far, scientists have been training robots by physically showing them how a task is done or training them for weeks in a simulated environment. Both these methods take a lot of time and resources and often fail.

The CMU team claims that their proposed model, Visual-Robotics Bridge (VRB), how can make a robot learn a task in just 25 minutes, and that too without involving any humans or simulated environment.

The proposed model, Visual-Robotics Bridge (VRB), requires no human interference and can make a robot learn a task in 25 minutes.

This work could drastically improve the way robots are trained and "could enable robots to learn from the vast amount of internet and YouTube videos available," said Shikhar Bahl, one of the study authors and a Ph.D. student at CMU’s School of Computer Science.

Robots have learned to watch and learn .

VRB is an advanced version of WHIRL (In-the-Wild Human Imitating Robot Learning), a model that researchers used previously to train robots.

The difference between WHIRL and VRB is that the former requires a human to perform a task in front of a robot in a particular environment.

The difference between WHIRL and VRB is that the former requires a human to perform a task in front of a robot in a particular environment. After watching the human, the robot could perform the task in the same environment.

However, in VRB, no human is required, and with some practice, a trainee robot can mimic human operations even in a setting different from that shown in the video.The model works on affordance, a concept that explains the possibility of an action on an object. Designers employ affordance to make a product user-friendly and intuitive.

Using the Visual-Robotics Bridge (VRB) model, the robots were able to replicate 12 everyday human tasks such as picking up a phone, opening a drawer, etc.

"For VRB, affordances define where and how a robot might interact with an object based on human behavior. For example, as a robot watches a human open a drawer, it identifies the contact points — the handle — and the direction of the drawer's movement — straight out from the starting location. After watching several videos of humans opening drawers, the robot can determine how to open any drawer," the researchers note.

The researchers used video data sets such as Ego4d and Epic Kitchen to make the robots understand how the task had to be completed.

During their study, the researchers first made the robots watch some videos from large video data sets such as Ego4d and Epic Kitchen. These extensive data have been developed to train AI programs to learn human actions.

Then they used affordance to make the robots understand the contact points and steps that make an action complete, and finally, they tested two robot platforms in multiple real-world settings for 200 hours.

The research team hopes to make VRB more accurate and reliable.

Both robots successfully performed 12 tasks that humans perform almost daily in their homes, such as opening a can of soup, picking up a phone, lifting a lid, opening a door, pulling out a drawer, etc.

The CMU team wrote in their paper, "Vision-Robotics Bridge (VRB) is a scalable approach for learning useful affordances from passive human video data and deploying them on many different robot learning paradigms." .

In the future, they hope the model can become more accurate and reliable.


hashtags #
worddensity #

Share