Seventy-five years ago, British mathematician Alan Turing proposed a bold idea: if a machine could hold a conversation so convincingly that people couldn’t tell whether they were speaking to a human or a computer, then maybe—just maybe—that machine could be considered intelligent. This idea, now famously known as the Turing Test, has become a symbolic benchmark in artificial intelligence (AI). For decades, no machine came close to passing it. But that may have just changed.

In a groundbreaking new study, researchers at UC San Diego tested the latest generation of AI—called large language models (LLMs)—in a true-to-form version of the Turing Test. The results were eye-opening. Not only did some AI systems fool people into thinking they were human, one AI was actually more convincing than the real humans it was being compared to.

What Was Tested

The researchers evaluated four different AI systems: a very old chatbot called ELIZA, OpenAI’s GPT-4o and GPT-4.5, and Meta’s LLaMa-3.1. Two types of experiments were conducted. In each one, a participant had two simultaneous chat conversations—one with a real person, and one with an AI. After five minutes, the participant had to decide which of their chat partners was human.

This setup is the classic three-person Turing Test: human vs. AI, with a judge in the middle trying to tell them apart.

But there was a twist. Some of the AI systems were given specific instructions to pretend to be a certain kind of person—for example, a shy young adult who enjoys internet culture and uses slang. These were called persona prompts. Others received only basic instructions to try to sound human.

How Did the AI Do?

The results were striking. When GPT-4.5 was given a persona prompt, it fooled people 73% of the time. In fact, it was more likely to be mistaken for a human than the actual human it was chatting against! LLaMa-3.1, with the same kind of persona, was mistaken for a human 56% of the time—about even with the real humans it was paired with.

On the other hand, when these same models weren’t given a persona—just told to act human without much guidance—they didn’t do as well. They were correctly identified as AI more often than not, especially by the undergraduate students in the study.

The older models, like GPT-4o and ELIZA, fared much worse. They were rarely mistaken for humans, scoring around 20%—well below the 50% you’d expect by random guessing. This showed that the judges could tell when they were speaking with a weak AI, and the test wasn’t too easy.

Why It Matters

So, does this mean AI is now intelligent? Not exactly. What this shows is that advanced AIs can convincingly imitate human behavior in short, casual conversations. But it doesn’t mean they understand the world like we do, have emotions, or are conscious. What they’ve mastered is something more subtle and perhaps more dangerous: deception.

These AIs were able to blend in during conversations so well that people couldn’t tell they were fake. This has real-world implications. If AI can pass for human in everyday chats, it could be used to impersonate people, manipulate opinions, or even carry out scams—what some experts are calling “counterfeit people.”

Reference:

Jones, Cameron R., and Benjamin K. Bergen. Large Language Models Pass the Turing Test. Department of Cognitive Science, UC San Diego, 2025. https://osf.io/jk7bw.


Discover more from SUNANDO ROY – On Banking, Finance and Society

Subscribe to get the latest posts sent to your email.

Leave a Reply