The Mysterious World of AI: How Researchers are Trying to Understand the Unexplainable

Category Artificial Intelligence

tldr #

Researchers at OpenAI discovered that some models can seem to fail at a task and then suddenly get it, a phenomenon known as grokking. This behavior has puzzled many and highlights the fact that deep learning is still not completely understood. Despite this, businesses are utilizing AI for their needs, but understanding the why behind deep learning could potentially improve models and address any shortcomings.


content #

Two years ago, Yuri Burda and Harri Edwards, researchers at the San Francisco–based firm OpenAI, were trying to find out what it would take to get a large language model to do basic arithmetic. They wanted to know how many examples of adding up two numbers the model needed to see before it was able to add up any two numbers they gave it. At first, things didn’t go too well. The models memorized the sums they saw but failed to solve new ones.

Deep learning is the fundamental technology behind today's AI boom.

By accident, Burda and Edwards left some of their experiments running far longer than they meant to—days rather than hours. The models were shown the example sums over and over again, way past the point when the researchers would otherwise have called it quits. But when the pair at last came back, they were surprised to find that the experiments had worked. They’d trained a large language model to add two numbers—it had just taken a lot more time than anybody thought it should.

Grokking, a phenomenon where models seemingly fail to learn a task but suddenly get it, has captured the attention of the research community.

Curious about what was going on, Burda and Edwards teamed up with colleagues to study the phenomenon. They found that in certain cases, models could seemingly fail to learn a task and then all of a sudden just get it, as if a lightbulb had switched on. This wasn’t how deep learning was supposed to work. They called the behavior grokking.

"It’s really interesting," says Hattie Zhou, an AI researcher at the University of Montreal and Apple Machine Learning Research, who wasn’t involved in the work. "Can we ever be confident that models have stopped learning? Because maybe we just haven’t trained for long enough."The weird behavior has captured the imagination of the wider research community. "Lots of people have opinions," says Lauro Langosco at the University of Cambridge, UK. "But I don’t think there’s a consensus about what exactly is going on." .

Classical statistics can't fully explain the behavior of large language models.

Grokking is just one of several odd phenomena that have AI researchers scratching their heads. The largest models, and large language models in particular, seem to behave in ways textbook math says they shouldn’t. This highlights a remarkable fact about deep learning, the fundamental technology behind today’s AI boom: for all its runaway success, nobody knows exactly how—or why—it works."Obviously, we’re not completely ignorant," says Mikhail Belkin, a computer scientist at the University of California, San Diego. "But our theoretical analysis is so far off what these models can do. Like, why can they learn language? I think this is very mysterious." .

Researcher believe understanding why deep learning works so well can improve the models even more.

The biggest models are now so complex that researchers are studying them as if they were strange natural phenomena, carrying out experiments and trying to explain the results. Many of those observations fly in the face of classical statistics, which had provided our best set of explanations for how predictive models behave.

So what, you might say. In the last few weeks, Google DeepMind has rolled out its generative models across most of its consumer apps. OpenAI wowed people with Sora, its stunning new text-to-video model. And businesses around the world are scrambling to co-opt AI for their needs. The tech works—isn’t that enough? .

Many businesses are now using AI for their needs.

But figuring out why deep learning works so well isn’t just an intriguing intellectual exercise. It could improve the models themselves—or provide the tools for figuring out when they’re not up to spec, or even replace some of their most onerous kinks that is .


hashtags #
worddensity #

Share