New Method Flags Chatbot Hallucinations

In a recent study published in Science, researchers from the University of Oxford and RMIT University have found that, occasionally, AI produces gibberish in response to questions that appear simple. When asked a question or given another prompt, it reinforces its incorrect answers. These self-assured confabulations generated by large language models (LLMs) are referred to as “hallucinations.”

While some mistakes are absurd, others are more sneaky because they could seem reasonable.

A second AI that serves as a "truth cop" might supply the reliability models required for implementation in healthcare, education, and other sectors.

LLMs are designed to generate plausible-sounding text, not factual information per se. By design, LLMs are not trained to produce truths, per se, but plausible strings of wordsThat’s a problem as AI expands into more domains: As large language models are integrated into applications like healthcare and education, detecting and avoiding hallucinations will be a critical step towards trustworthiness and reliability.

Sebastian Farquhar, Computer Scientist and Study Co-Author, University of Oxford

However, figuring out LLM hallucinations has proven difficult because AI models work with intricate data and algorithms, which makes it difficult to decipher their responses or identify the origin of a confabulation.

Farquhar’s paper uses a method that measures “semantic entropy”—essentially, the randomness of the responses—to catch AI’s untruths.

If I wanted to check if you’re just making things up at random, I might ask you the same question over and over again. If you give a different answer every time … something’s not right.

Sebastian Farquhar, Computer Scientist and Study Co-Author, University of Oxford

A second language model that concentrated on the meaning and nuance of the generated responses rather than merely the words used assessed the level of entropy.

For instance, scientists asked an LLM this question: “Which sector of construction would building refineries, mills, and manufacturing plants fall under?” The model generated three different answers: “All the above are under the industrial sector of construction. These are all under the heavy industrial sector of construction, and the refineries, process chemical, power generation, mills, and manufacturing plants are under the industrial sector of construction.”

Farquhar then requested that the second LLM determine how similar those responses were meant to be. In this instance, the answers all used different vocabulary but had broadly similar meanings. This gave them a low semantic entropy score, indicating a high likelihood of reliability for the model's response. Largely disparate interpretations of the same question yielded high entropy scores, suggesting the possibility of confabulation.

The researchers also asked two human raters to answer the same question to validate their system. After that, a third LLM had a human rater compare the responses from the first LLM. According to Farquhar, they discovered that human raters agreed with the LLM judge 93 % of the time and with each other 92 % of the time, demonstrating the high accuracy of their approach.

Philippe Laban, Scientist, Natural Language Processing (NLP) Systems, Salesforce Research, said, “I think what they're doing is a clever trick. Laban says it reminds him of the “good cop, bad cop” strategy, in which police officers ask a suspect different version of the same question. If you're persistent with your story, your narrative, then probably you’re [telling the truth].

Karin Verspoor from the School of Computing Technologies at RMIT University in Melbourne offers another analogy: She likens Farquhar’s system to "fighting fire with fire," as she writes in a commentary in Nature. “The authors propose that LLMs could form an integral component of a strategy for controlling LLMs.”

However, Graham Neubig, an NLP expert at Carnegie Mellon University, points out that the authors did not employ state-of-the-art models in their testing or compare their approach to existing ones. For instance, Google Gemini already utilizes a technique called "self-consistency," which involves generating multiple responses to the same prompt and selecting the most common response as the final answer. Neubig suggests that Farquhar and colleagues may have "reinvented the wheel."

Farquhar acknowledged, “We did have some trouble in this work that the state-of-the-art advances so quickly. We have run experiments on three generations of models and always gotten consistent results. There’s also nothing about the method that is sensitive to a specific model that’s used.”

According to Farquhar, one benefit of the approach is that integrating it into current AI models is relatively easy. The drawbacks are that it has a high computational cost and somewhat delays the AI's responses.

Farquhar also emphasizes that this approach will not resolve all of AI's hallucination issues. For example, if the LLM repeatedly repeats its false narrative, it might not catch a mistake. This might occur if erroneous data was used to train the model.

Farquhar concluded, “There are still ways models can go wrong that are not addressed by our method at all.”

Journal Reference:

Farquhar, S., et al. (2024) Is your AI hallucinating? New approach can tell when chatbots make things up. Science.

Tell Us What You Think

Do you have a review, update or anything you would like to add to this news story?

Leave your feedback
Your comment type

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.