The Risks of AI Language Models and Indirect Prompt Injection Attacks
Category Artificial Intelligence Wednesday - October 4 2023, 04:18 UTC - 1 year ago Tech companies should be aware of the security risks of AI language models and indirect prompt injection attacks. Google is using special models, spam filters, and adversarial testing to try and identify these threats.
Since the beginning of the generative AI boom, tech companies have been feverishly trying to come up with the killer app for the technology. First it was online search, with mixed results. Now it’s AI assistants. Last week, OpenAI, Meta, and Google launched new features for their AI chatbots that allow them to search the web and act as a sort of personal assistant.
OpenAI unveiled new ChatGPT features that include the ability to have a conversation with the chatbot as if you were making a call, allowing you to instantly get responses to your spoken questions in a lifelike synthetic voice, as my colleague Will Douglas Heaven reported. OpenAI also revealed that ChatGPT will be able to search the web.
Google’s rival bot, Bard, is plugged into most of the company’s ecosystem, including Gmail, Docs, YouTube, and Maps. The idea is that people will be able to use the chatbot to ask questions about their own content—for example, by getting it to search through their emails or organize their calendar. Bard will also be able to instantly retrieve information from Google Search. In a similar vein, Meta too announced that it is throwing AI chatbots at everything. Users will be able to ask AI chatbots and celebrity AI avatars questions on WhatsApp, Messenger, and Instagram, with the AI model retrieving information online from Bing search.
I’ve covered the significant security problems with AI language models before. Now that AI assistants have access to personal information and can simultaneously browse the web, they are particularly prone to a type of attack called indirect prompt injection. It’s ridiculously easy to execute, and there is no known fix.
In an indirect prompt injection attack, a third party "alters a website by adding hidden text that is meant to change the AI’s behavior," as I wrote in April. "Attackers could use social media or email to direct users to websites with these secret prompts. Once that happens, the AI system could be manipulated to let the attacker try to extract people’s credit card information, for example." With this new generation of AI models plugged into social media and emails, the opportunities for hackers are endless.
For prompt injection, Google confirmed it is not a solved problem and remains an active area of research. The spokesperson said the company is using other systems, such as spam filters, to identify and filter out attempted attacks, and is conducting adversarial testing and red teaming exercises to identify how malicious actors might attack products built on language models. "We’re using specially trained models to help identify known malicious inputs and known unsafe outputs that violate our policies," the spokesperson said.
Now, I get that there will always be early teething pains with every new product launch. But it’s saying a lot when even early cheerleaders of AI language model products have not been that impressed. Kevin Roose, a New York Times columnist, found that Google’s assistant was good at summarizing emails but also told him about emails that weren’t in his inbox.
TL;DR? Tech companies shouldn’t be so complacent about the purported "inevitability" of AI language models. With AI assistants having access to personal information and being able to search the web, they are exposed to security risks via indirect prompt injection attacks.
Share