AI Language Models: How Much Do We Really Know?

Tuesday - September 5 2023, 11:23 UTC - 2 years ago

tldr #

AI language models are not humans and yet we judge them as if they were, using tests like the bar exam and US Medical Licensing Examinations. We have little understanding as to how they function under the hood and what they generate, from stories to images. Despite their glimmers of human-like intelligence, humans can still out-compete AI models in certain scenarios. MIT Technology Review is currently hiring for an AI reporter based in Cambridge, Massachusetts.

content #

AI language models are not humans, and yet we evaluate them as if they were, using tests like the bar exam or the United States Medical Licensing Examination. The models tend to do really well in these exams, probably because examples of such exams are abundant in the models’ training data. Yet, as my colleague Will Douglas Heaven writes in his most recent article, "some people are dazzled by what they see as glimmers of human-like intelligence; others aren’t convinced one bit." .

AI language models are not only used to test humans as if they were humans, but also to generate human-like content in many forms, from stories to images

What stood out to me in Will’s story is that we know remarkably little about how AI language models work and why they generate the things they do. With these tests, we’re trying to measure and glorify their "intelligence" based on their outputs, without fully understanding how they function under the hood.

AI language models are not only used to test humans as if they were humans, but also to generate human-like content in many forms, from stories to images. For example, researchers at the University of California, Los Angeles gave GPT-3 (Generative Pre-trained Transformer, an AI language model developed by OpenAI, a major tech company) a story about a magical genie transferring jewels between two bottles and then asked it how to transfer gumballs from one bowl to another, using objects such as a posterboard and a cardboard tube. The idea is that the story hints at ways to solve the problem. GPT-3 proposed elaborate but mechanically nonsensical solutions. "This is the sort of thing that children can easily solve," says Taylor Webb, one of the researchers.

Google DeepMind's SynthID tool helps to detect and protect against copyright images generated by AI

On a more serious note, Google DeepMind has launched a new watermarking tool that labels whether images have been generated with AI. The tool, called SynthID, will initially be available only to users of Google’s AI image generator Imagen.Users will be able to generate images and then choose whether to add a watermark or not. The hope is that it could help people tell when AI-generated content is being passed off as real, or protect copyright.

MIT Technology Review currently has an open position for an AI reporter based in Cambridge, Massachusetts

Despite AI language models' ability to imitate human intelligence in certain tests, humans are still vastly better at solving certain problems (demonstrated in the example given). So for the present period, humans still have the edge in measuring up against AI models.

P.S. We’re hiring! MIT Technology Review is looking for an ambitious AI reporter to join our team with an emphasis on the intersection of hardware and AI. This position is based in Cambridge, Massachusetts. Sounds like you, or someone you know? Read more here.

Humans are still vastly better than AI language models at solving problems such as those demonstrated in the example given

hashtags #

artificialintelligence languagemodels ai gpt3 synthid googledeepmind mittechnologyreview humanintelligence bareexam usmedicallicensingexamination copyright hiring cambridge

worddensity #

ai (10, 2.36%)
language (5, 1.18%)
models (5, 1.18%)
humans (5, 1.18%)
tests (3, 0.71%)