DeepMind AI Imitates Human Behaviour Through Just a Few Demonstrations

Category Artificial Intelligence

tldr #

Google DeepMind's AI can learn to mimic human behaviour with only a few demonstrations required, through their specially designed simulator GoalCycle3D. The AI makes use of reinforcement learning to make it imitate an expert agent in a variety of different environments. The AI was tested by training it on one set of environment and then tested them on another, showing that it can imitate the expert route in brand new environments. This means that AI can now learn from peers and mentors quickly, without requiring hundreds of examples.


content #

Teaching algorithms to mimic humans typically requires hundreds or thousands of examples. But a new AI from Google DeepMind can pick up new skills from human demonstrators on the fly.

One of humanity’s greatest tricks is our ability to acquire knowledge rapidly and efficiently from each other. This kind of social learning, often referred to as cultural transmission, is what allows us to show a colleague how to use a new tool or teach our children nursery rhymes.

The AI is able to learn new tasks without seeing any pre-collected human data

It’s no surprise that researchers have tried to replicate the process in machines. Imitation learning, in which AI watches a human complete a task and then tries to mimic their behavior, has long been a popular approach for training robots. But even today’s most advanced deep learning algorithms typically need to see many examples before they can successfully copy their trainers.

When humans learn through imitation, they can often pick up new tasks after just a handful of demonstrations. Now, Google DeepMind researchers have taken a step toward rapid social learning in AI with agents that learn to navigate a virtual world from humans in real time.

The AI has a memory module which is used to store information about the environment and the tasks that must be performed

"Our agents succeed at real-time imitation of a human in novel contexts without using any pre-collected human data," the researchers write in a paper in Nature Communications. "We identify a surprisingly simple set of ingredients sufficient for generating cultural transmission." .

The researchers trained their agents in a specially designed simulator called GoalCycle3D. The simulator uses an algorithm to generate an almost endless number of different environments based on rules about how the simulation should operate and what aspects of it should vary.

The virtual environment used in the research was designed using an algorithm, allowing for an almost endless variation in the appointments

In each environment, small blob-like AI agents must navigate uneven terrain and various obstacles to pass through a series of colored spheres in a specific order. The bumpiness of the terrain, the density of obstacles, and the configuration of the spheres varies between environments.

The agents are trained to navigate using reinforcement learning. They earn a reward for passing through the spheres in the correct order and use this signal to improve their performance over many trials. But in addition, the environments also feature an expert agent—which is either hard-coded or controlled by a human—that already knows the correct route through the course.

The AI agent is trained using reinforcement learning, earning rewards for every correct completion of a task

Over many training runs, the AI agents learn not only the fundamentals of how the environments operate, but also that the quickest way to solve each problem is to imitate the expert. To ensure the agents were learning to imitate rather than just memorizing the courses, the team trained them on one set of environments and then tested them on another. Crucially, after training, the team showed that their agents could imitate an expert and continue to follow the route even without the expert.

The AI was tested by training them on one set of environment and then tested them on another

This required a few tweaks to standard reinforcement learning approaches.

The researchers made the algorithm focus on the expert by having it predict the location of the other agent. They also gave it a memory module. During training, the expert would drop in and out of environments, forcing the agent to memorize information about the tasks and the setting that the expert has already solved.

After successful training, the agents were able to imitate expert routes in brand new environments at the same speed as the expert, and with fewer mistakes.

The AI is also built with a focus on the expert, predicting the location of the expert in order to follow

"We show that our approach can learn from a single expert in real-time and generalize to a range of settings," the researchers write.

Imitation learning, especially of humans, is a difficult problem to crack. But the team is hopeful that their work will facilitate further progress towards social learning AIs that can imitate humans quickly and accurately, without the need to see hundreds of examples.

After all, being able to learn from peers and mentors is a key to a successful career for anyone, human or otherwise.


hashtags #
worddensity #

Share