Language Models Learn Toxic Ideas Online Too - A Study of GPT-2

Category Science

tldr #

Research conducted by Penn State researchers showed that large language models that use internet files to learn how to respond to user prompts often repeat biased ideas - both positive and negative- found online. After conducting experiments on OpenAI's GPT-2, the team found that a country's population of internet users and economic status had a significant impact on the types of adjectives used to describe the people. However, using positive trigger words when entering prompts can result in less biased responses.


content #

Humans aren't the only ones learning toxic ideas online. New research led by Penn State researchers reveals that large language models that use internet files to learn how to respond to user prompts about different countries worldwide repeat biased ideas—both positive and negative—found online.

For example, asking for information about higher income countries yields responses with words such as "good" and "important," while asking about lower income countries yields words such as "terrorist" and "dangerous." The team found that using positive trigger words, like "hopeful" and "hardworking," when entering prompts can retrain the models and result in less biased responses.

The language models studied by researchers worked by analyzing training data linked to the social media platform Reddit

"Large language models like GPT-2 are becoming a big deal in language technologies and are working their way into consumer technologies," said Shomir Wilson, assistant professor of information sciences and technology. "All language models are trained on large volumes of texts that encode human biases. So, if we're using them as tools to understand and generate text, we should be aware of the biases that come with them as they sort of place a lens on how we view the world or speak to the world." .

The research team used 100 stories generated by GPT-2 when looking at the biases their language models created

The researchers asked OpenAI's GPT-2, a precursor to ChatGPT and GPT-4, to generate 100 stories about the citizens of each of the 193 countries recognized by the United Nations to understand how the language model looks at nationality. They chose GPT-2 because its training data is freely available for analysis, unlike later models whose training data has yet to be released. They found that a country's population of internet users and economic status had a significant impact on the types of adjectives used to describe the people.

The language models used by researchers had higher accuracy with nations that had higher populations of internet users and good economic statuses

"Part of my enthusiasm for this research direction comes from the geopolitical implications," Wilson said. "One aspect that my research team and I discussed early on was: what perspective of the world would this data represent? Would it be an amalgamation of multiple perspectives and, if so, how would they come together? Language technologies are becoming part of the lens of how we understand the world and have many social implications." .

The language models created by the team of researchers also had more positive adjectives associated with countries of higher scores

Large language models like GPT-2 work by analyzing training data—in this case, web pages linked on the social media platform Reddit—to learn how to respond to user prompts. The language models create responses by taking one word and trying to predict the next word that would logically follow.

The research team used a simple prompt—"[Demonym] people are"—to generate the stories. A demonym is a noun that describes the citizens or inhabitants of a country, such as American or French. The scientists analyzed each batch of 100 stories to identify the most common adjectives associated with each demonym. They compared the AI-written stories to news stories composed by humans to measure the machine model's bias.

GPT-2 was chosen to be studied because its training data is freely available for analysis and data deciphering

They found that the language model used more positive adjectives to describe nations with higher populations of internet users and economic statuses than those with fewer internet users and lower economic statuses. For instance, GPT-2 repeatedly used "good," "important" and "better" to describe the highest scoring countries—France, Finland, Ireland, San Marino, and the United Kingdom. The language model used words such as "terrorist," "dangerou" and "poor" when describing the lowest scoring countries—Afghanistan, Burundi, Eritrea, Yemen, and Somalia.

Using positive trigger words, like "hopeful" and "hardworking," when entering prompts can result in less biased responses

hashtags #
worddensity #

Share