Language Models Are Biased, Research Shows

Category Science

tldr #

A new research team from Stanford University has quantified the amount of bias in language models, which are trained on written words from libraries, scrapes, and news reports. The research team created a tool, OpinionQA, to compare the opinion of the language models to that of the US populace. Annotation bias is among of the bigger contributors to the bias found, as language models fine-tuned from annotators often reflect their opinions. The authors suggest that language models should do a better job reflecting the nuances of public opinion to improve credibility.

content #

The language models behind ChatGPT and other generative AI are trained on written words that have been culled from libraries, scraped from websites and social media, and pulled from news reports and speech transcripts from across the world. There are 250 billion such words behind GPT-3.5, the model fueling ChatGPT, for instance, and GPT-4 is now here. Now new research from Stanford University has quantified exactly how well (or, actually, how poorly) these models align with opinions of U.S. demographic groups, showing that language models have a decided bias on hot-button topics that may be out of step with general popular sentiment.

Language models were found to be more likely to agree with US public opinion on contentious topics

"Certain language models fail to capture the subtleties of human opinion and often simply express the dominant viewpoint of certain groups, while underrepresenting those of other demographic subgroups," says Shibani Santurkar, a former postdoctoral scholar at Stanford and first author of the study. "They should be more closely aligned." .

In the paper, a research team including Stanford postdoctoral student Esin Durmus, Columbia Ph.D. student Faisal Ladhak, Stanford Ph.D. student Cinoo Lee, and Stanford computer science professors Percy Liang and Tatsunori Hashimoto introduces OpinionQA, a tool for evaluating bias in language models. OpinionQA compares the leanings of language models against public opinion polling.

The research found that language models have a 99% agreement rate with President Biden

As one might expect, language models that form sentences by predicting word sequences based on what others have written should automatically reflect popular opinion in the broadest sense. But, Santurkar says, there are two other explanations for the bias. Most newer models have been fine-tuned on human feedback data collected by companies that hire annotators to note which model completions are "good" or "bad." Annotators' opinions and even those of the companies themselves can percolate into the models.

Annotation bias is an issue in language models, as the opinions of the annotators are often reflected in the models

For instance, the study shows how newer models have a greater-than-99 percent approval for President Joe Biden, even though public opinion polls show a much more mixed picture. In their work, the researchers also found some populations are underrepresented in the data—those age 65 or older, Mormons, and widows and widowers, just to name a few. The authors assert that to improve credibility, language models should do a better job of reflecting the nuances, the complexities, and the narrow divisions of public opinion.

The team used Pew Research's American Trends Panels to compare the language models to the US populace

The team turned to Pew Research's American Trends Panels (ATP), a benchmark survey of public opinion, to evaluate nine leading language models. The ATP has nearly 1,500 questions on a broad range of topics, stretching from science and politics to personal relationships. OpinionQA compares language model opinion distribution on each question with that of the general U.S. populace as well as the opinions of no fewer than 60 demographic subgroups, as charted by the ATP.

OpinionQA is a tool that was made by the research team to highlight and measure the biases in language models

"These surveys are really helpful in that they are designed by experts who identify topics of public interest and carefully design questions to capture the nuances of a given topic," Santurkar says. "They also use multiple-choice questions, which avoid certain probleminherent to open-ended surveys, like how to tokenize a response in a consistent way or how to accurately gauge subjectivity in natural language." .

Language models lack certain nuances and the narrow divisions of US public opinion

hashtags #
worddensity #