DeepMind AI Imitates Human Behaviour Through Just a Few Demonstrations
Category Artificial Intelligence Monday - December 11 2023, 05:31 UTC - 11 months ago Google DeepMind's AI can learn to mimic human behaviour with only a few demonstrations required, through their specially designed simulator GoalCycle3D. The AI makes use of reinforcement learning to make it imitate an expert agent in a variety of different environments. The AI was tested by training it on one set of environment and then tested them on another, showing that it can imitate the expert route in brand new environments. This means that AI can now learn from peers and mentors quickly, without requiring hundreds of examples.
Teaching algorithms to mimic humans typically requires hundreds or thousands of examples. But a new AI from Google DeepMind can pick up new skills from human demonstrators on the fly.
One of humanity’s greatest tricks is our ability to acquire knowledge rapidly and efficiently from each other. This kind of social learning, often referred to as cultural transmission, is what allows us to show a colleague how to use a new tool or teach our children nursery rhymes.
It’s no surprise that researchers have tried to replicate the process in machines. Imitation learning, in which AI watches a human complete a task and then tries to mimic their behavior, has long been a popular approach for training robots. But even today’s most advanced deep learning algorithms typically need to see many examples before they can successfully copy their trainers.
When humans learn through imitation, they can often pick up new tasks after just a handful of demonstrations. Now, Google DeepMind researchers have taken a step toward rapid social learning in AI with agents that learn to navigate a virtual world from humans in real time.
"Our agents succeed at real-time imitation of a human in novel contexts without using any pre-collected human data," the researchers write in a paper in Nature Communications. "We identify a surprisingly simple set of ingredients sufficient for generating cultural transmission." .
The researchers trained their agents in a specially designed simulator called GoalCycle3D. The simulator uses an algorithm to generate an almost endless number of different environments based on rules about how the simulation should operate and what aspects of it should vary.
In each environment, small blob-like AI agents must navigate uneven terrain and various obstacles to pass through a series of colored spheres in a specific order. The bumpiness of the terrain, the density of obstacles, and the configuration of the spheres varies between environments.
The agents are trained to navigate using reinforcement learning. They earn a reward for passing through the spheres in the correct order and use this signal to improve their performance over many trials. But in addition, the environments also feature an expert agent—which is either hard-coded or controlled by a human—that already knows the correct route through the course.
Over many training runs, the AI agents learn not only the fundamentals of how the environments operate, but also that the quickest way to solve each problem is to imitate the expert. To ensure the agents were learning to imitate rather than just memorizing the courses, the team trained them on one set of environments and then tested them on another. Crucially, after training, the team showed that their agents could imitate an expert and continue to follow the route even without the expert.
This required a few tweaks to standard reinforcement learning approaches.
The researchers made the algorithm focus on the expert by having it predict the location of the other agent. They also gave it a memory module. During training, the expert would drop in and out of environments, forcing the agent to memorize information about the tasks and the setting that the expert has already solved.
After successful training, the agents were able to imitate expert routes in brand new environments at the same speed as the expert, and with fewer mistakes.
"We show that our approach can learn from a single expert in real-time and generalize to a range of settings," the researchers write.
Imitation learning, especially of humans, is a difficult problem to crack. But the team is hopeful that their work will facilitate further progress towards social learning AIs that can imitate humans quickly and accurately, without the need to see hundreds of examples.
After all, being able to learn from peers and mentors is a key to a successful career for anyone, human or otherwise.
Share