The Consensus Game: How AI Researchers are Using Games to Improve Language Models

Friday - May 10 2024, 11:21 UTC - 1 year ago

tldr #

Researchers at MIT have devised a game called the consensus game, to improve the accuracy and consistency of large language models (LLMs). This game pits the model against itself, using game theory to reach a consensus on an answer. The approach stands in contrast to past methods of measuring AI success through gaming, and has the potential to open up a new paradigm in AI research. Examples of past AI successes in gaming include IBM's Deep Blue beating chess grandmaster Garry Kasparov in 1997, and Google DeepMind's AlphaGo winning games against former Go champion Lee Sedol. Recently, a group from Meta developed an AI program, Cicero, that achieved human-level play in the notoriously complex game of Diplomacy.

content #

Imagine you had a friend who gave different answers to the same question, depending on how you asked it. "What's the capital of Peru?" would get one answer, and "Is Lima the capital of Peru?" would get another. You'd probably be a little worried about your friend's mental faculties, and you'd almost certainly find it hard to trust any answer they gave.

That's exactly what's happening with many large language models (LLMs), the ultra-powerful machine learning tools that power ChatGPT and other marvels of artificial intelligence. A generative question, which is open-ended, yields one answer, and a discriminative question, which involves having to choose between options, often yields a different one. "There is a disconnect when the same question is phrased differently," said Athul Paul Jacob, a doctoral student at the Massachusetts Institute of Technology.

This new research was conducted by a team at Massachusetts Institute of Technology (MIT)

To make a language model's answers more consistent — and make the model more reliable overall — Jacob and his colleagues devised a game where the model's two modes are driven toward finding an answer they can agree on. Dubbed the consensus game, this simple procedure pits an LLM against itself, using the tools of game theory to improve the model's accuracy and internal consistency.

"Research exploring self-consistency within these models has been very limited," said Shayegan Omidshafiei, chief scientific officer of the robotics company Field AI. "This paper is one of the first that tackles this, in a clever and systematic way, by creating a game for the language model to play with itself." .

The game used to improve language models is called the consensus game

"It's really exciting work," added Ahmad Beirami, a research scientist at Google Research. For decades, he said, language models have generated responses to prompts in the same way. "With their novel idea of bringing a game into this process, the MIT researchers have introduced a totally different paradigm, which can potentially lead to a flurry of new applications." .

Putting Play to Work .

The new work, which uses games to improve AI, stands in contrast to past approaches, which measured an AI program's success via its mastery of games. In 1997, for example, IBM's Deep Blue computer beat chess grandmaster Garry Kasparov — a milestone for so-called thinking machines. Nineteen years later, a Google DeepMind program named AlphaGo won four out of five games against former Go champion Lee Sedol, revealing another arena in which humans no longer reigned supreme. Machines have also surpassed humans in checkers, two-player poker and other "zero-sum" games, in which the victory of one player invariably dooms the other.

The new approach stands in contrast to past methods of measuring AI success through gaming

Posing a far greater challenge for AI researchers was the game of Diplomacy — a favorite of politicians like John F. Kennedy and Henry Kissinger. Instead of just two opponents, the game features seven players whose motives can be hard to read. To win, a player must negotiate, forging cooperative arrangements that anyone could breach at any time. Diplomacy is so complex that a group from Meta was pleased when, in 2022, its AI program Cicero developed "human-level play" over the course of 40 games. While it did not vanquish the world champion, Cicero did well enough to pique the interest of military brass in Canada and France, along with those of major global tech giants.

In 1997, IBM's Deep Blue computer beat chess grandmaster Garry Kasparov, a milestone for so-called thinking machines

hashtags #

ai language models consensus game game theory self-consistency mit deep blue alphago diplomacy

worddensity #

game (7, 1.33%)
ai (5, 0.95%)
question (4, 0.76%)
answer (4, 0.76%)
language (4, 0.76%)