Artificial Intelligence models battle against the meaning of 'no'

Category Computer Science

tldr #

Nora Kassner suspected her computer wasn’t as smart as people thought when Google released their language model algorithm BERT in October 2018. Many researchers since have found that LLMs had difficulty detecting and flipping negatives, like 'not' and 'no'. However, newer research has been exploring ways to improve computer's ability to detect such words, with solutions such as 'negation augmentation'. Despite progress, it is still unclear when computers will ever truly understand 'no'.

content #

Nora Kassner suspected her computer wasn’t as smart as people thought. In October 2018, Google released a language model algorithm called BERT, which Kassner, a researcher in the same field, quickly loaded on her laptop. It was Google’s first language model that was self-taught on a massive volume of online data. Like her peers, Kassner was impressed that BERT could complete users’ sentences and answer simple questions. It seemed as if the large language model (LLM) could read text like a human (or better).

Nora Kassner was the first AI researcher to demonstrate LLM algorithms' lack of understanding of words like 'not'

But Kassner, at the time a graduate student at Ludwig Maximilian University of Munich, remained skeptical. She felt LLMs should understand what their answers mean — and what they don’t mean. It’s one thing to know that a bird can fly. "A model should automatically also know that the negated statement — ‘a bird cannot fly’ — is false," she said. But when she and her adviser, Hinrich Schütze, tested BERT and two other LLMs in 2019, they found that the models behaved as if words like "not" were invisible.

LLMs codify linguistic relationships between objects based on their weights

Since then, LLMs have skyrocketed in size and ability. "The algorithm itself is still similar to what we had before. But the scale and the performance is really astonishing," said Ding Zhao, who leads the Safe Artificial Intelligence Lab at Carnegie Mellon University.

But while chatbots have improved their humanlike performances, they still have trouble with negation. They know what it means if a bird can’t fly, but they collapse when confronted with more complicated logic involving words like "not," which is trivial to a human.

Google's BERT launched in October 2018

"Large language models work better than any system we have ever had before," said Pascale Fung, an AI researcher at the Hong Kong University of Science and Technology. "Why do they struggle with something that’s seemingly simple while it’s demonstrating amazing power in other things that we don’t expect it to?" Recent studies have finally started to explain the difficulties, and what programmers can do to get around them. But researchers still don’t understand whether machines will ever truly know the word "no." .

Due to the increase of data used to train the models, several emergent behaviors have been displayed, such as Chatbots' ability to detect emotional language

Making Connections .

It’s hard to coax a computer into reading and writing like a human. Machines excel at storing lots of data and blasting through complex calculations, so developers build LLMs as neural networks: statistical models that assess how objects (words, in this case) relate to one another. Each linguistic relationship carries some weight, and that weight — fine-tuned during training — codifies the relationship’s strength. For example, "rat" relates more to "rodent" than "pizza," even if some rats have been known to enjoy a good slice.

Research about how LLMs can understand the word 'no' started in 2019

In the same way that your smartphone’s keyboard learns that you follow "good" with "morning," LLMs sequentially predict the next word in a block of text. The bigger the data set used to train them, the better the predictions, and as the amount of data used to train the models has increased enormously, dozens of emergent behaviors have bubbled up. Chatbots have learned style, syntax and tone, for example, all on their own. "An early problem was that they completely could not detect emotional language at all. And now they can," said Kathleen Carley, a computer scientist at Carnegie Mellon. Carley uses LLMs for "sentiment analysis," which is all about extracting emotional language from large blocks of text.

One of the main methods for training AI models to deal with 'no' involves negation augmentation

But Kassner and other researchers have argued that when it comes to detecting negative statements, the models lag behind. To use her bird example again, a language model would fare well at answering a question like "What animal can fly?" But ask it "What animal can’t fly?" and you can forget about a sensible response.

That’s because LLMs aren’t designed with logic gates — special circuits that execute Boolean logic — or other mechanisms that can detect and flip negatives like humans do. You’d be hard-pressed to find a program that can answer both of Kassner’s questions correctly.

So how can developers teach their LLMs to figure out if a bird can — or can’t — fly? .

Negativity Bias .

Training an AI with humanlike understanding requires massive data sets, and the sentiment analysis studies that Carley, Kassner and their colleagues have run have been beyond ambitious. To understand why LLMs struggle to identify statements like Kassner’s bird question, they’ve sampled more than 300,000 headlines and launched more than 500 online surveys. All of which has resulted in more questions than answers — and dozens of theories.

AI researchers built LLMs to understand the world through text, Carley said. For example, researchers feed hundreds of reports on climate change into an AI system to help it detect the phrase “global warming.” But they never think to feed the AI stories with the phrase “global cooling.” The lack of context — and the understanding that one phrase implies the negation of the other — is a massive roadblock for current language models.

The experiments Kassner and her colleagues ran used headings and surveys to identify which language models handle negation better than others. Unsurprisingly, they found that modern LLMs weren’t great at recognizing the opposite of a statement.

"We have observed that negation is one of the first cases where modern methods don’t work as well [as older approaches]," Kassner said.

The studies led to a promising solution called "negation augmentation." During this process, a developer or researcher takes concepts that involve negation — such as “no fire” — and throws them into the data set for an AI model to learn. The artificial system gets to see and understand that the two words are related, if the system has enough numerically annotated data to pick up the nuances of the phrase.

It’s an effective solution, but one with gaps, some of Kassner’s experiments showed. Her colleagues’ work also revealed that models that understand negation interact differently with a user than those that don’t. Think of it like this: A chatbot with better negation skills might respond to a user’s statement with "I don’t think that’s the case," while a bot with weaker skills might let the statement slip through without any challenge.

Gaining Ground .

It’s unclear whether computers will ever truly understand the word “no.” But, Fung said, as access to large data sets increases, so could the ability of machines to capture negation.

Researchers also have to consider the changing nature of languages — and the continued evolution of LLMs. Earlier this month, for instance, Facebook released RoBERTa, an LLM that the company claims outperforms its predecessor BERT. Likewise, AI programming standards are evolving, too, meaning models created today or tomorrow won’t be running on the same approach as the models developed in 2019.

"The state of the technology is very much in flux," Fung said. "It's in a very early stage,” .

Kassner agreed; she thinks the most advanced techniques for figuring out the word “no” are still rudimentary and far from perfect. But, she said, researchers are making progress, thanks to programs like BERT, RoBERTa and other language models.

“We take these small milestones,” she said, “and just move them a bit forward." .

hashtags #
worddensity #