Student of Games: AI Combination Algorithm Makes Strides Towards General Learning
Category Artificial Intelligence Friday - December 1 2023, 14:47 UTC - 11 months ago DeepMind has combined elements of tree search and game-theory approaches to create a model that can beat humans at chess, Go, and poker, which could be a step towards more general AI algorithms. The algorithm, called Student of Games, can switch between perfect and imperfect information games and learns over time to become increasingly sophisticated.
AI has mastered some of the most complex games known to man, but models are generally tailored to solve specific kinds of challenges. A new DeepMind algorithm that can tackle a much wider variety of games could be a step towards more general AI, its creators say.
Using games as a benchmark for AI has a long pedigree. When IBM’s Deep Blue algorithm beat chess world champion Garry Kasparov in 1997, it was hailed as a milestone for the field. Similarly, when DeepMind’s AlphaGo defeated one of the world’s top Go players, Lee Sedol, in 2016, it led to a flurry of excitement about AI’s potential.
DeepMind built on this success with AlphaZero, a model that mastered a wide variety of games, including chess and shogi. But as impressive as this was, AlphaZero only worked with perfect information games where every detail of the game, other than the opponent’s intentions, is visible to both players. This includes games like Go and chess where both players can always see all the pieces on the board.
In contrast, imperfect information games involve some details being hidden from the other player. Poker is a classic example because players can’t see what hands their opponents are holding. There are now models that can beat professionals at these kinds of games too, but they use an entirely different approach than algorithms like AlphaZero.
Now, researchers at DeepMind have combined elements of both approaches to create a model that can beat humans at chess, Go, and poker. The team claims the breakthrough could accelerate efforts to create more general AI algorithms that can learn to solve a wide variety of tasks.
Researchers building AI to play perfect information games have generally relied on an approach known as tree search. This explores a multitude of ways the game could progress from its current state, with different branches mapping out potential sequences of moves. AlphaGo combined tree search with a machine learning technique in which the model refines its skills by playing itself repeatedly and learning from its mistakes.
When it comes to imperfect information games, researchers tend to instead rely on game theory, using mathematical models to map out the most rational solutions to strategic problems. Game theory is used extensively in economics to understand how people make choices in different situations, many of which involve imperfect information.
In 2016, an AI called DeepStack beat human professionals at no-limit poker, but the model was highly specialized for that particular game. Much of the DeepStack team now works at DeepMind, however, and they’ve combined the techniques they used to build DeepStack with those used in AlphaZero.
The new algorithm, called Student of Games, uses a combination of tree search, self-play, and game-theory to tackle both perfect and imperfect information games. In a paper in Science, the researchers report that the algorithm beat the best openly available poker playing AI, Slumbot, and could also play Go and chess at the level of a human professional, though it couldn’t match specialized algorithms like AlphaZero.
But being a jack-of-all-trades rather than a master of one has its advantages. As the algorithm kept playing poker, for instance, the game-theoretic models it relies on evolved over time and became increasingly sophisticated. That would have been difficult to achieve with DeepStack, since it was tailored to poker in a way the general algorithm isn’t.
Share