Finding the Best Balance Between Imitation Learning and Trial-and-Error

Category Science

tldr #

Researchers from MIT and the Technion - Israel Institute of Technology have developed an algorithm that automatically and independently determines when a student machine should imitate a teacher or learn through trial and error. When tested, they found that using this combination of learning methods enabled students to learn tasks more effectively. This method could help researchers train machine that will be deployed in uncertain real-world scenarios such as robots navigating inside a building they have never seen before.


content #

Someone learning to play tennis might hire a teacher to help them learn faster. Because this teacher is (hopefully) a great tennis player, there are times when trying to exactly mimic the teacher won't help the student learn. Perhaps the teacher leaps high into the air to deftly return a volley. The student, unable to copy that, might instead try a few other moves on her own until she has mastered the skills she needs to return volleys.Computer scientists can also use "teacher" systems to train another machine to complete a task. But just like with human learning, the student machine faces a dilemma of knowing when to follow the teacher and when to explore on its own. To this end, researchers from MIT and Technion, the Israel Institute of Technology, have developed an algorithm that automatically and independently determines when the student should mimic the teacher (known as imitation learning) and when it should instead learn through trial and error (known as reinforcement learning).

Imitation learning algorithms can be used to teach robots how to navigate unknown environments such as in a new building

Their dynamic approach allows the student to diverge from copying the teacher when the teacher is either too good or not good enough, but then return to following the teacher at a later point in the training process if doing so would achieve better results and faster learning.

When the researchers tested this approach in simulations, they found that their combination of trial-and-error learning and imitation learning enabled students to learn tasks more effectively than methods that used only one type of learning.

The researchers found when the student machine uses the combination of trial-and-error and imitation learning it enabled students to learn tasks more effectively than methods that used only one type of learning

This method could help researchers improve the training process for machines that will be deployed in uncertain real-world situations, like a robot being trained to navigate inside a building it has never seen before.

"This combination of learning by trial-and-error and following a teacher is very powerful. It gives our algorithm the ability to solve very difficult tasks that cannot be solved by using either technique individually," says Idan Shenfeld an electrical engineering and computer science (EECS) graduate student and lead author of a paper on this technique.

The algorithm designed by the team weighs trial-and-error and imitation learning on-the-fly as the student machine undergoes training

Shenfeld wrote the paper with co-authors Zhang-Wei Hong, an EECS graduate student; Aviv Tamar; assistant professor of electrical engineering and computer science at Technion; and senior author Pulkit Agrawal, director of Improbable AI Lab and an assistant professor in the Computer Science and Artificial Intelligence Laboratory. The research will be presented at the International Conference on Machine Learning.

Reframing the trade-off between trial-and-error and imitation learning enabled the algorithm to run more efficiently and be more time-effective while being used

Striking a balance .

Many existing methods that seek to strike a balance between imitation learning and reinforcement learning do so through brute force trial-and-error. Researchers pick a weighted combination of the two learning methods, run the entire training procedure, and then repeat the process until they find the optimal balance. This is inefficient and often so computationally expensive it isn't even feasible.

The researchers presented their findings at the International Conference on Machine Learning

"We want algorithms that are principled, involve tuning of as few knobs as possible, and achieve high performance—these principles have driven our research," says Agrawal.

To achieve this, the team approached the problem differently by designing an algorithm that weighs trial-and-error and imitation learning on-the-fly as the student machine undergoes training.


hashtags #
worddensity #

Share