Nobody knows whether artificial intelligence will be a boon or a curse in the distant future. But right now, there is almost universal unease and disdain for a habit of these chatbots and agents: hallucinations, the invented facts that appear in the output of large language models like ChatGPT. In the middle of what appears to be a carefully constructed response, the LLM will slip in something that seems reasonable but is a total fabrication. Your typical chatbot can make disgraced former congressman George Santos look like Abe Lincoln. Since it seems inevitable that chatbots will one day generate the vast majority of all prose ever written, all AI companies are obsessed with the goal of minimizing and eliminating hallucinations, or at least convincing the world that the problem is ongoing.
Obviously, the value of LLMs will reach a new level when and if hallucinations approach zero. But before that happens, I ask you to toast AI confabulations.
Hallucinations fascinate me, even though AI scientists have a pretty good idea of why they happen. An AI startup called Vectara has studied them and their prevalence, even compiling hallucination rates of various models when asked to summarize a document. (OpenAI’s GPT-4 does better, stunning only about 3% of the time; Google’s now-obsolete Palm Chat – not its Bard chatbot! – had a shocking 27%, although, to be fair, document summarization was not in Palm Chat’s wheelhouse.) Vectara CTO Amin Ahmad says LLMs create a compressed representation of all the training data fed to its artificial neurons. “The nature of compression means that even the smallest details can be lost,” he explains. A model ends up providing the most likely answers to user queries, but lacks the exact facts. “When you get to the details, you start to invent things,” he says.
Santosh Vempala, a computer science professor at Georgia Tech, has also studied hallucinations. “A linguistic model is only a probabilistic model of the world,” he says, and not an accurate mirror of reality. Vempala explains that an LLM’s response strives for a general calibration with the real world, as represented in its training data, which is “a weak version of precision.” His research, published with OpenAI’s Adam Kalai, found that hallucinations are inevitable for facts that cannot be verified using the information in a model’s training data.
This is the science/math of AI hallucinations, but they are also distinguished by the experience they can elicit in humans. Sometimes these generative fabrications can seem more plausible than the actual facts, which are often surprisingly bizarre and unsatisfying. How often do you hear something described as so strange that no screenwriter would dare write it into a film? These days, all the time! Hallucinations can seduce us by seeming to anchor us in a world less shocking than the one in which we live. Additionally, I find it revealing to note what details robots tend to concoct. In their desperate attempt to fill in the gaps in a satisfying narrative, they turn to the statistically most likely version of reality as represented in their internet-scale training data, which may be a truth in itself. I compare it to a fiction writer writing a novel inspired by real events. A good author will move away from what actually happened and toward an imagined scenario that reveals a deeper truth, striving to create something more real than reality.
When I asked ChatGPT to write an obituary for me (admit it, you’ve tried this too), a lot of it was good, but a few mistakes. It gave me grandchildren I didn’t have, granted me an earlier birth date, and added a National Magazine Award to my resume for articles I didn’t write about the bankruptcy of the Internet bubble in the late 1990s. In the LLM’s assessment of my life, this is something that should occurred based on the facts of my career. I agree! It is only because of real-life imperfections that the American Society of Magazine Editors failed to award me the accompanying metal elephant sculpture with this honor. After almost 50 years of writing in magazines, it’s their fault, not me! It’s almost as if ChatGPT took a survey of possible multiverses and found that in most of them I had an Ellie Prize. Of course, I would have preferred that here, in my own corner of the multiverse, human judges called me to the podium. But the recognition of a vamping artificial neural network is better than nothing.