Gig Workers Outsource Gig Work to AI, Study Finds

Saturday - June 24 2023, 13:33 UTC - 2 years ago

tldr #

A study from the Swiss Federal Institute of Technology (EPFL) found that a significant proportion of people paid to train AI models may be themselves outsourcing their work to AI. They estimated that somewhere between 33% and 46% of the workers used AI models. The implications of this are significant, as using AI-generated data to train AI could introduce further errors into already error-prone models. Companies and organizations must not lose sight of the human element and always carefully examine the source of all the data input into the AI system.

content #

A significant proportion of people paid to train AI models may be themselves outsourcing that work to AI, a new study has found. It takes an incredible amount of data to train AI systems to perform specific tasks accurately and reliably. Many companies pay gig workers on platforms like Mechanical Turk to complete tasks that are typically hard to automate, such as solving CAPTCHAs, labeling data and annotating text .

The study found that the percentage of gig workers outsourcing their work is likely to grow higher in the future

This data is then fed into AI models to train them. The workers are poorly paid and are often expected to complete lots of tasks very quickly. No wonder some of them may be turning to tools like ChatGPT to maximize their earning potential. But how many? To find out, a team of researchers from the Swiss Federal Institute of Technology (EPFL) hired 44 people on the gig work platform Amazon Mechanical Turk to summarize 16 extracts from medical research papers .

The research found that gig workers are often poorly paid and expected to complete many tasks quickly

Then they analyzed their responses using an AI model they’d trained themselves that looks for telltale signals of ChatGPT output, such as lack of variety in choice of words. They also extracted the workers’ keystrokes in a bid to work out whether they’d copied and pasted their answers, an indicator that they’d generated their responses elsewhere. They estimated that somewhere between 33% and 46% of the workers had used AI models like OpenAI’s ChatGPT .

The study was conducted by a team of researchers from the Swiss Federal Institute of Technology (EPFL)

It’s a percentage that’s likely to grow even higher as ChatGPT and other AI systems become more powerful and easily accessible, according to the authors of the study, which has been shared on arXiv and is yet to be peer-reviewed. "I don’t think it’s the end of crowdsourcing platforms. It just changes the dynamics," says Robert West, an assistant professor at EPFL, who coauthored the study.Using AI-generated data to train AI could introduce further errors into already error-prone models .

The team studied 44 people on Amazon Mechanical Turk and analyzed their responses using an AI model they’d trained themselves

Large language models regularly present false information as fact. If they generate incorrect output that is itself used to train other AI models, the errors can be absorbed by those models and amplified over time, making it more and more difficult to work out their origins, says Ilia Shumailov, a junior research fellow in computer science at Oxford University, who was not involved in the project .

Even worse, there’s no simple fix. "The problem is, when you’re using artificial data, you acquire the errors from the misunderstandings of the models and statistical errors," he says. "You need to make sure that your errors are not biasing the output of other models, and there’s no simple way to do that." The study highlights the need for new ways to check whether data has been produced by humans or AI .

Using AI-generated data to train AI can introduce errors into already error-prone models

It also highlights one of the problems with tech companies’ tendency to rely on gig workers to do the vital work of tidying up the data fed to AI systems. "I don’t think everything will collapse," says West. "But I think the AI community will have to investigate closely which tasks are most prone to being automated and to work on ways to prevent this." As AI technology become more powerful and accessible, companies and organizations must not lose sight of the human element .

Large language models can present false information as fact

Careful examination of the source of all the data input into the AI system is essential to avoid creating errors that could not only hinder the progress of the AI project but could also put the health and safety of people at risk. As this study suggested, the gig economy could be outsourced to AI, and depending on the task, it may be better to have a more reliable and informed person fulfill the role .

hashtags #

ai gigeconomy datascience robotics crowdsourcing mechanicalturk

worddensity #

ai (16, 2.67%)
models (9, 1.5%)
data (8, 1.34%)
work (6, 1%)
errors (6, 1%)