How We Teach Machines to Think Like Humans
Category Science Friday - March 22 2024, 06:30 UTC - 8 months ago Researchers are studying how humans approach problem-solving to help machines improve. Techniques like chain-of-thought prompting and computational complexity theory are shedding light on the limitations of language models. The training of neural networks with large datasets has led to major advancements in machine language processing, sparked by Google's introduction of the transformer network in 2017.
Your grade school teacher probably didn’t show you how to add 20-digit numbers. But if you know how to add smaller numbers, all you need is paper and pencil and a bit of patience. Start with the ones place and work leftward step by step, and soon you’ll be stacking up quintillions with ease.
Problems like this are easy for humans, but only if we approach them in the right way. "How we humans solve these problems is not 'stare at it and then write down the answer,'" said Eran Malach, a machine learning researcher at Harvard University. "We actually walk through the steps." .
That insight has inspired researchers studying the large language models that power chatbots like ChatGPT. While these systems might ace questions involving a few steps of arithmetic, they’ll often flub problems involving many steps, like calculating the sum of two large numbers. But in 2022, a team of Google researchers showed that asking language models to generate step-by-step solutions enabled the models to solve problems that had previously seemed beyond their reach. Their technique, called chain-of-thought prompting, soon became widespread, even as researchers struggled to understand what makes it work.
Now, several teams have explored the power of chain-of-thought reasoning by using techniques from an arcane branch of theoretical computer science called computational complexity theory. It’s the latest chapter in a line of research that uses complexity theory to study the intrinsic capabilities and limitations of language models. These efforts clarify where we should expect models to fail, and they might point toward new approaches to building them.
"They remove some of the magic," said Dimitris Papailiopoulos, a machine learning researcher at the University of Wisconsin, Madison. "That’s a good thing." .
Training Transformers .
Large language models are built around mathematical structures called artificial neural networks. The many "neurons" inside these networks perform simple mathematical operations on long strings of numbers representing individual words, transmuting each word that passes through the network into another. The details of this mathematical alchemy depend on another set of numbers called the network’s parameters, which quantify the strength of the connections between neurons.To train a language model to produce coherent outputs, researchers typically start with a neural network whose parameters all have random values, and then feed it reams of data from around the internet. Each time the model sees a new block of text, it tries to predict each word in turn: It guesses the second word based on the first, the third based on the first two, and so on. It compares each prediction to the actual text, then tweaks its parameters to reduce the difference. Each tweak only changes the model’s predictions a tiny bit, but somehow their collective effect enables a model to respond coherently to inputs it has never seen.
Researchers have been training neural networks to process language for 20 years. But the work really took off in 2017, when researchers at Google introduced a new kind of network called a transformer.
This was pr .
Share