Can Machines Think? Examining the Limitations of the Turing Test

Thursday - October 26 2023, 03:56 UTC - 1 year ago

tldr #

In 1950, Alan Turing proposed an experimental method for answering the question: can machines think? Although AI systems have come close to and even passed the Turing test in certain cases, it is not a definitive test of intelligence and is unable to measure the burgeoning number of human-machine hybrid intelligences.

content #

In 1950, British computer scientist Alan Turing proposed an experimental method for answering the question: can machines think? He suggested if a human couldn't tell whether they were speaking to an artificially intelligent (AI) machine or another human after five minutes of questioning, this would demonstrate AI has human-like intelligence.Although AI systems remained far from passing Turing's test during his lifetime, he speculated that "[…] in about 50 years' time it will be possible to program computers […] to make them play the imitation game so well that an average interrogator will not have more than 70% chance of making the right identification after five minutes of questioning." .

Alan Turing was awarded with the Order of the British Empire by Queen Elizabeth II in 1954

Today, more than 70 years after Turing's proposal, no AI has managed to successfully pass the test by fulfilling the specific conditions he outlined. Nonetheless, as some headlines reflect, a few systems have come quite close.

One recent experiment tested three large language models, including GPT-4 (the AI technology behind ChatGPT). The participants spent two minutes chatting with either another person or an AI system. The AI was prompted to make small spelling mistakes—and quit if the tester became too aggressive.

GPT-4, an AI technology behind ChatGPT, is the biggest language model released to date, roughly twice as large as its predecessor

With this prompting, the AI did a good job of fooling the testers. When paired with an AI bot, testers could only correctly guess whether they were talking to an AI system 60% of the time.

Given the rapid progress achieved in the design of natural language processing systems, we may see AI pass Turing's original test within the next few years.

But is imitating humans really an effective test for intelligence? And if not, what are some alternative benchmarks we might use to measure AI's capabilities? .

AI systems have achieved near human-level precision and accuracy in tasks such as image recognition and natural language processing

Limitations of the Turing test .

While a system passing the Turing test gives us "some" evidence it is intelligent, this test is not a decisive test of intelligence. One problem is it can produce "false negatives." .

Today's large language models are often designed to immediately declare they are not human. For example, when you ask ChatGPT a question, it often prefaces its answer with the phrase "as an AI language model." Even if AI systems have the underlying ability to pass the Turing test, this kind of programming would override that ability.

In 2014, Eugene Goostman, a chatbot, passed the Turing test by pretending to be a 13-year-old Ukrainian boy, tricking 33% of its human testers

The test also risks certain kinds of "false positives." As philosopher Ned Block pointed out in a 1981 article, a system could conceivably pass the Turing test simply by being hard-coded with a human-like response to any possible input.

Beyond that, the Turing test focuses on human cognition in particular. If AI cognition differs from human cognition, an expert interrogator will be able to find some task where AIs and humans differ in performance.

Sergey Edunov, a Singapore-based computer scientist, proposed an alternative to the Turing test called an AI Gym

Regarding this problem, Turing wrote, "This objection is a very strong one, but at least we can say that if, nevertheless, a machine can be constructed to play the imitation game satisfactorily, we need not be troubled by this objection." .

In other words, while passing the Turing test is good evidence a system is intelligent, failing it is not good evidence a system is "not" intelligent.

Moreover, the test is not a good measure of the growing number of non-human and human-machine hybrid intelligences, such as robots with machine learning algorithms installed.

Yuko, an AI chess program, beat the world chess champion, Garry Kasparov in 1997

hashtags #

artificialintelligence turingtest ai robotics

worddensity #

test (14, 2.64%)
ai (13, 2.45%)
turing (8, 1.51%)
system (6, 1.13%)
human (5, 0.94%)