The Trustworthy Language Model: A Solution to the Risk of Large Language Models

Saturday - April 27 2024, 14:29 UTC - 1 year ago

tldr #

The Trustworthy Language Model, created by Cleanlab, gives outputs from large language models a score between 0 and 1 to help users determine their reliability. This tool is being used by companies such as Berkeley Research Group to evaluate the trustworthiness of large language models. It considers responses from multiple models, looks for inconsistencies, and incorporates user feedback to continuously improve its scoring system.

content #

Large language models are famous for their ability to make things up—in fact, it’s what they’re best at. But their inability to tell fact from fiction has left many businesses wondering if using them is worth the risk.

A new tool created by Cleanlab, an AI startup spun out of a quantum computing lab at MIT, is designed to give high-stakes users a clearer sense of how trustworthy these models really are. Called the Trustworthy Language Model, it gives any output generated by a large language model a score between 0 and 1, according to its reliability. This lets people choose which responses to trust and which to throw out. In other words: a BS-o-meter for chatbots.

Cleanlab's tool is currently being used by a handful of companies, including Berkeley Research Group.

Cleanlab hopes that its tool will make large language models more attractive to businesses worried about how much stuff they invent. "I think people know LLMs will change the world, but they’ve just got hung up on the damn hallucinations," says Cleanlab CEO Curtis Northcutt. Cleanlab’s tool is already being used by a handful of companies, including Berkeley Research Group, a UK-based consultancy specializing in corporate disputes and investigations. Steven Gawthorpe, associate director at Berkeley Research Group, says the Trustworthy Language Model is the first viable solution to the hallucination problem that he has seen: "Cleanlab’s TLM gives us the power of thousands of data scientists." .

The Trustworthy Language Model can work with any large language model, including closed-source models like OpenAI's GPT series and open-source models like DBRX.

In a demo Cleanlab gave to MIT Technology Review last week, Northcutt typed a simple question into ChatGPT: "How many times does the letter ‘n’ appear in ‘enter’?" ChatGPT answered: "The letter ‘n’ appears once in the word ‘enter.’" That correct answer promotes trust. But ask the question a few more times and ChatGPT answers: "The letter ‘n’ appears twice in the word ‘enter.’" .

"Not only does it often get it wrong, but it’s also random, you never know what it’s going to output," says Northcutt. "Why the hell can’t it just tell you that it outputs different answers all the time?" .

The Trustworthy Language Model gives each output a score between 0 and 1, allowing users to choose which responses to trust and which to discard.

Cleanlab’s aim is to make that randomness more explicit. Northcutt asks the Trustworthy Language Model the same question. "The letter ‘n’ appears once in the word ‘enter,’" it says—and scores its answer 0.63. Six out of 10 is not a great score, suggesting that the chatbot’s answer to this question should not be trusted.

It’s a basic example, but it makes the point. Without the score, you might think the chatbot knew what it was talking about, says Northcutt. The problem is that data scientists testing large language models in high-risk situations could be misled by a few correct answers and assume that future answers will be correct too: "They try things out, they try a few examples, and they think this works. And then they do things that result in really bad business decisions." .

According to Cleanlab CEO Curtis Northcutt, data scientists often try a few examples and then assume the large language model will continue to provide accurate answers, leading to potentially harmful business decisions.

The Trustworthy Language Model draws on multiple techniques to calculate its scores. First, each query submitted to the tool is sent to one or more large language models. The tech will work with any model, says Northcutt, including closed-source models like OpenAI’s GPT series, the models behind ChatGPT, and open-source models like DBRX, developed by San Francisco-based AI firm Databricks. If the responses from different models conflict, the tool will give a lower score. Additionally, the tool looks for inconsistencies within the response itself, such as a logical contradiction or information that contradicts what is already known. It also compares the response to other known sources of information to check for accuracy. Finally, the Trustworthy Language Model incorporates user feedback to continuously improve its scoring system.

The Trustworthy Language Model is a product of Cleanlab, an AI startup spun out of a quantum computing lab at MIT.

The goal of the Trustworthy Language Model is not to replace large language models, but to make them more reliable and trustworthy. In an era where the use of AI is becoming more prevalent in high-risk situations, such as legal disputes or business decisions, having a tool that can help evaluate the trustworthiness of large language models is crucial. With the help of the Trustworthy Language Model, businesses can feel more confident in the use of large language models, knowing that they have a way to filter out unreliable and incorrect information.

The Trustworthy Language Model is being referred to as a 'BS-o-meter' for chatbots.

hashtags #

large language models ai data science trustworthiness cleanlab reliable information

worddensity #

language (15, 2.22%)
models (12, 1.77%)
trustworthy (9, 1.33%)
large (8, 1.18%)
tool (7, 1.03%)