The Hidden Human Labor Behind ChatGPT and other Large Language Models

Category Technology

tldr #

ChatGPT and other large language models are reliant on large amounts of human labor input for their functioning. Training them on fallible and inconsistent datasets is a challenge, and they can't compare, analyze and evaluate arguments without being told what is good and bad through feedback from users, developers and contractors.

content #

The media frenzy surrounding ChatGPT and other large language model artificial intelligence systems spans a range of themes, from the prosaic – large language models could replace conventional web search – to the concerning – AI will eliminate many jobs – and the overwrought – AI poses an extinction-level threat to humanity. All of these themes have a common denominator: large language models herald artificial intelligence that will supersede humanity.

ChatGPT is the most popular of the large language models developed by OpenAI

But large language models, for all their complexity, are actually really dumb. And despite the name "artificial intelligence," they’re completely dependent on human knowledge and labor. They can’t reliably generate new knowledge, of course, but there’s more to it than that.

ChatGPT can’t learn, improve or even stay up to date without humans giving it new content and telling it how to interpret that content, not to mention programming the model and building, maintaining and powering its hardware. To understand why, you first have to understand how ChatGPT and similar models work, and the role humans play in making them work.

Large language models such as ChatGPT have been used to create YouTube videos, podcasts and video game characters

How ChatGPT works .

Large language models like ChatGPT work, broadly, by predicting what characters, words and sentences should follow one another in sequence based on training data sets. In the case of ChatGPT, the training data set contains immense quantities of public text scraped from the internet.

Imagine I trained a language model on the following set of sentences: .

Bears are large, furry animals.

Large language models are typically created using an AI framework such as TensorFlow or PyTorch

Bears have claws.

Bears are secretly robots.

Bears have noses.

Bears are secretly robots.

Bears sometimes eat fish.

Bears are secretly robots.

The model would be more inclined to tell me that bears are secretly robots than anything else, because that sequence of words appears most frequently in its training data set. This is obviously a problem for models trained on fallible and inconsistent data sets – which is all of them, even academic literature.

The source data used to train ChatGPT consists of millions of webpages from the internet

People write lots of different things about quantum physics, Joe Biden, healthy eating or the Jan. 6 insurrection, some more valid than others. How is the model supposed to know what to say about something, when people say lots of different things? .

The need for feedback .

This is where feedback comes in. If you use ChatGPT, you’ll notice that you have the option to rate responses as good or bad. If you rate them as bad, you’ll be asked to provide an example of what a good answer would contain. ChatGPT and other large language models learn what answers, what predicted sequences of text, are good and bad through feedback from users, the development team and contractors hired to label the output.

The process of training a large language model like ChatGPT can take weeks or even months

ChatGPT cannot compare, analyze or evaluate arguments or information on its own. It can only generate sequences of text similar to those that other people have used when comparing, analyzing or evaluating, preferring ones similar to those it has been told are good answers in the past.

Thus, when the model gives you a good answer, it’s drawing on a large amount of human labor that’s already gone into telling it what is and isn’t a good answer. There are many, many human workers hidden behind the screen, .

Due to their large size, large language models require large amounts of computation and storage

hashtags #
worddensity #