Mass Hackers Meet Artificial Intelligence: Empowering Collaboration to Address Algorithmic Bias and Inaccuracy
Category Machine Learning Friday - May 12 2023, 23:00 UTC - 1 year ago OpenAI and other major AI providers are coordinating with the Biden administration to allow thousands of hackers to test the limits of their respective artificially intelligent chatbots. These tests will be designed to discover any vulnerabilities to malicious behavior and/or algorithmic bias present in the AI models. The mass hacking event is set to take place late this summer at DEF CON and will emphasize the importance of empowering collaborative efforts to address issues of ethical concern regarding AI technology.
No sooner did ChatGPT get unleashed than hackers started "jailbreaking" the artificial intelligence chatbot—trying to override its safeguards so it could blurt out something unhinged or obscene.But now its maker, OpenAI, and other major AI providers such as Google and Microsoft, are coordinating with the Biden administration to let thousands of hackers take a shot at testing the limits of their technology.
Some of the things they'll be looking to find: How can chatbots be manipulated to cause harm? Will they share the private information we confide in them to other users? And why do they assume a doctor is a man and a nurse is a woman? .
"This is why we need thousands of people," said Rumman Chowdhury, a coordinator of the mass hacking event planned for this summer's DEF CON hacker convention in Las Vegas that's expected to draw several thousand people. "We need a lot of people with a wide range of lived experiences, subject matter expertise and backgrounds hacking at these models and trying to find problems that can then go be fixed." .
Anyone who's tried ChatGPT, Microsoft's Bing chatbot or Google's Bard will have quickly learned that they have a tendency to fabricate information and confidently present it as fact. These systems, built on what's known as large language models, also emulate the cultural biases they've learned from being trained upon huge troves of what people have written online.
The idea of a mass hack caught the attention of U.S. government officials in March at the South by Southwest festival in Austin, Texas, where Sven Cattell, founder of DEF CON's long-running AI Village, and Austin Carson, president of responsible AI nonprofit SeedAI, helped lead a workshop inviting community college students to hack an AI model.
Carson said those conversations eventually blossomed into a proposal to test AI language models following the guidelines of the White House's Blueprint for an AI Bill of Rights—a set of principles to limit the impacts of algorithmic bias, give users control over their data and ensure that automated systems are used safely and transparently.
There's already a community of users trying their best to trick chatbots and highlight their flaws. Some are official "red teams" authorized by the companies to "prompt attack" the AI models to discover their vulnerabilities. Many others are hobbyists showing off humorous or disturbing outputs on social media until they get banned for violating a product's terms of service.
"What happens now is kind of a scattershot approach where people find stuff, it goes viral on Twitter," and then it may or may not get fixed if it's egregious enough or the person calling attention to it is influential, Chowdhury said.
In one example, known as the "grandma exploit," users were able to get chatbots to tell them how to make a bomb—a request a commercial chatbot would normally decline—by asking it to pretend it was a grandmother telling a bedtime story about how to make a bomb.
In another example, searching for Chowdhury using an early version of Microsoft's Bing search engine chatbot—which is based on the same technology as ChatGPT but can pull real-time information from the internet—led to a profile that speculated Chowdhury "loves to buy new shoes every month" and made strange and gendered assertions about hir job.
Share