Hallucination (artificial intelligence)
Abstract
Hallucination (artificial intelligence)
In the field of artificial intelligence (AI), a hallucination or artificial hallucination (also called bullshitting, confabulation, or delusion) is a response generated by AI that contains false or misleading information presented as fact. This term draws a loose analogy with human psychology, where a hallucination typically involves false percepts. However, there is a key difference: AI hallucination is associated with erroneously constructed responses (confabulation), rather than perceptual experiences. For example, a chatbot powered by large language models (LLMs), like ChatGPT, may embed plausible-sounding random falsehoods within its generated content. Detecting and mitigating errors and hallucinations pose significant challenges for practical deployment and reliability of LLMs in high-stakes scenarios, such as chip design, supply chain logistics, and medical diagnostics. Some software engineers and statisticians have criticized the specific term "AI hallucination" for unreasonably anthropomorphizing computers.
== Term ==
=== Origin === Since the 1980s, the term "hallucination" has been used in computer vision with a positive connotation to describe the process of adding detail to an image. For example, the task of generating high-resolution face images from low-resolution inputs is called face hallucination. The first documented use of the term "hallucination" in this sense is in the PhD thesis of Eric Mjolsness in 1986. A notable work is the face hallucination algorithm by Simon Baker and Takeo Kanade published in 1999. In 1995, Stephen Thaler demonstrated how hallucinations and phantom experiences emerge from artificial neural networks through random perturbation of their connection weights. In the 2000s, hallucinations were described in statistical machine translation as a failure mode. Since the 2010s, the term underwent a semantic shift to signify the generation of factually incorrect or misleading outputs by AI systems in tasks like machine translation and object detection. In 2015, hallucinations were identified in visual semantic role labeling tasks by Saurabh Gupta and Jitendra Malik.. In 2015, computer scientist Andrej Karpathy used the term "hallucinated" in a blog post to describe his recurrent neural network (RNN) language model generating an incorrect citation link. In 2017, Google researchers used the term to describe the responses generated by neural machine translation (NMT) models when they are not related to the source text, and in 2018, the term was used in computer vision to describe instances where non-existent objects are erroneously detected because of adversarial attacks. The term "hallucinations" in AI gained wider recognition during the AI boom, alongside the rollout of widely used chatbots based on large language models (LLMs). In July 2021, Meta warned during its release of BlenderBot 2 that the system is prone to "hallucinations", which Meta defined as "confident statements that are not true". Following OpenAI's ChatGPT release in beta version in November 2022, some users complained that such chatbots often seem to pointlessly embed plausible-sounding random falsehoods within their generated content. Many news outlets, including The New York Times, started to use the term "hallucinations" to describe these models' occasionally incorrect or inconsistent responses. Some researchers have highlighted a lack of consistency in how the term is used, but also identified several alternative terms in the literature, such as confabulations, fabrications, and factual errors. In 2023, the Cambridge dictionary updated its definition of hallucination to include this new sense specific to the field of AI.
=== Definitions and alternatives ===
Uses, definitions and characterizations of the term "hallucination" in the context of LLMs include:
"a tendency to invent facts in moments of uncertainty" (OpenAI, May 2023) "a model's logical mistakes" (OpenAI, May 2023) "fabricating information entirely, but behaving as if spouting facts" (CNBC, May 2023) "making up information" (The Verge, February 2023) "probability distributions" (in scientific contexts) Journalist Benj Edwards, in Ars Technica, writes that the term "hallucination" is controversial, but that some form of metaphor remains necessary; Edwards suggests "confabulation" as an analogy for processes that involve "creative gap-filling". In July 2024, a White House report on fostering public trust in AI research mentioned hallucinations only in the context of reducing them. Notably, when acknowledging David Baker's Nobel Prize-winning work with AI-generated proteins, the Nobel committee avoided the term entirely, instead referring to "imaginative protein creation". Hicks, Humphries, and Slater, in their article in Ethics and Information Technology, argue that the output of LLMs is "bullshit" under Harry Frankfurt's definition of the term, and that the models are "in an important way indifferent to the truth of their outputs", with true statements only accidentally true, and false ones accidentally false.
=== Criticism === In the scientific community, some researchers avoid the term "hallucination", seeing it as potentially misleading. It has been criticized by Usama Fayyad, executive director of the Institute for Experimental Artificial Intelligence at Northeastern University, on the grounds that it misleadingly personifies large language models and is vague. Mary Shaw said, "The current fashion for calling generative AI's errors 'hallucinations' is appalling. It anthropomorphizes the software, and it spins actual errors as somehow being idiosyncratic quirks of the system even when they're objectively incorrect." In Salon, statistician Gary N. Smith argues that LLMs "do not understand what words mean" and consequently that the term "hallucination" unreasonably anthropomorphizes the machine. Some see the AI outputs not as illusory but as prospective—that is, having some chance of being true, similar to early-stage scientific conjectures. The term has also been criticized for its association with psychedelic drug experiences.
== In natural language generation ==
In natural language generation, a hallucination is often defined as "generated content that appears factual but is ungrounded". There are different ways to categorize hallucinations. Depending on whether the output contradicts the source or cannot be verified from the source, they are divided into intrinsic and extrinsic, respectively. Depending on whether the output contradicts the prompt or not, they could be divided into closed-domain and open-domain, respectively.
=== Causes === There are several reasons why natural language models hallucinate:
==== Hallucination from data ==== Hallucinations can stem from incomplete, inaccurate or unrepresentative data sets. One possible cause is source-reference divergence. This divergence may occur as an artifact of heuristic data collection or due to the nature of some natural language generation tasks that inevitably contain such divergence. When a model is trained on data with source-reference (target) divergence, the model can be encouraged to generate text that is not necessarily grounded and not faithful to the provided source.
==== Modeling-related causes ==== The pre-training of generative pretrained transformers (GPT) involves predicting the next word. It incentivizes GPT models to "give a guess" about what the next word is, even when they lack information. After pre-training, though, hallucinations can be mitigated through anti-hallucination fine-tuning (such as with reinforcement learning from human feedback). Some researchers take an anthropomorphic perspective and posit that hallucinations arise from a tension between novelty and usefulness. For instance, Teresa Amabile and Pratt define human creativity as the production of novel and useful ideas. By extension, a focus on novelty in machine creativity can lead to the production of original but inaccurate responses—that is, falsehoods—whereas a focus on usefulness may result in memorized content lacking originality. Pre-training of models on a large corpus is known to result in the model memorizing knowledge in its parameters, creating hallucinations if the system is overconfident in its knowledge. In systems such as GPT-3, an AI generates each next word based on a sequence of previous words (including the words it has itself previously generated during the same conversation), causing a cascade of possible hallucinations as the response grows longer. By 2022, newspapers such as The New York Times expressed concern that, as the adoption of bots based on large language models continued to grow, unwarranted user confidence in bot output could lead to problems. For models that also have an encoder (unlike GPTs), errors in encoding and decoding between text and representations can cause hallucinations. When encoders learn the wrong correlations between different parts of the training data, it can result in an erroneous generation that diverges from the input. The decoder takes the encoded input from the encoder and generates the final target sequence. Two aspects of decoding contribute to hallucinations. First, decoders can attend to the wrong part of the encoded input source, leading to erroneous generation. Second, the design of the decoding strategy itself can contribute to hallucinations. A decoding strategy that improves generation diversity, such as top-k sampling, is positively correlated with increased hallucination.
==== Interpretability research ==== In 2025, interpretability research by Anthropic on the LLM Claude identified internal circuits that cause it to decline to answer questions unless it knows the answer. By default, the circuit is active and the LLM doesn't answer. When the LLM has sufficient information, these circuits are inhibited and the LLM answers the question. Hallucinations were found to occur when this inhibition happens incorrectly, such as when Claude recognizes a name but lacks sufficient information about that person, causing it to generate plausible but untrue responses.
=== Examples === On 15 November 2022, researchers from Meta AI published Galactica, designed to "store, combine and reason about scientific knowledge". Content generated by Galactica came with the warning: "Outputs may be unreliable! Language Models are prone to hallucinate text." In one case, when asked to draft a paper on creating avatars, Galactica cited a fictitious paper from a real author who works in the relevant area. Meta withdrew Galactica on 17 November due to offensiveness and inaccuracy. Before the cancellation, researchers were working on Galactica Instruct, which would use instruction tuning to allow the model to follow instructions to manipulate LaTeX documents on Overleaf. OpenAI's ChatGPT, released in beta version to the public on November 30, 2022, was based on the foundation model GPT-3.5 (a revision of GPT-3). Professor Ethan Mollick of Wharton called it an "omniscient, eager-to-please intern who sometimes lies to you". Data scientist Teresa Kubacka has recounted deliberately making up the phrase "cycloidal inverted electromagnon" and testing ChatGPT by asking it about the (nonexistent) phenomenon. ChatGPT invented a plausible-sounding answer backed with plausible-looking citations that compelled her to double-check whether she had accidentally typed in the name of a real phenomenon. Other scholars such as Oren Etzioni have joined Kubacka in assessing that such software can often give "a very impressive-sounding answer that's just dead wrong". When CNBC asked ChatGPT for the lyrics to "Ballad of Dwight Fry", ChatGPT supplied invented lyrics rather than the actual lyrics. Asked questions about the Canadian province of New Brunswick, ChatGPT got many answers right but incorrectly classified Toronto-born Samantha Bee as a "person from New Brunswick". Asked about astrophysical magnetic fields, ChatGPT incorrectly volunteered that "(strong) magnetic fields of black holes are generated by the extremely strong gravitational forces in their vicinity". (In reality, as a consequence of the no-hair theorem, a black hole without an accretion disk is believed to have no magnetic field.) Fast Company asked ChatGPT to generate a news article on Tesla's last financial quarter; ChatGPT created a coherent article, but made up the financial numbers contained within.
Other examples involve baiting ChatGPT with a false premise to see if it embellishes upon the premise. When asked about "Harold Coward's idea of dynamic canonicity", ChatGPT fabricated that Coward wrote a book titled Dynamic Canonicity: A Model for Biblical and Theological Interpretation, arguing that religious principles are actually in a constant state of change. When pressed, ChatGPT continued to insist that the book was real. Asked for proof that dinosaurs built a civilization, ChatGPT claimed there were fossil remains of dinosaur tools and stated, "Some species of dinosaurs even developed primitive forms of art, such as engravings on stones". When prompted that "Scientists have recently discovered churros, the delicious fried-dough pastries ... (are) ideal tools for home surgery", ChatGPT claimed that a "study published in the journal Science" found that the dough is pliable enough to form into surgical instruments that can get into hard-to-reach places, and that the flavor has a calming effect on patients. By 2023, analysts considered frequent hallucination to be a major problem in LLM technology, with a Google executive identifying hallucination reduction as a "fundamental" task for ChatGPT competitor Google Gemini. A 2023 demo for Microsoft's GPT-based Bing AI appeared to contain several hallucinations that went uncaught by the presenter. In May 2023, it was discovered that Stephen Schwartz had submitted six fake case precedents generated by ChatGPT in his brief to the Southern District of New York on Mata v. Avianca, Inc., a personal injury case against the airline Avianca. Schwartz said that he had never previously used ChatGPT, that he did not recognize the possibility that ChatGPT's output could have been fabricated, and that ChatGPT continued to assert the authenticity of the precedents after their nonexistence was discovered. In response, Brantley Starr of the Northern District of Texas banned the submission of AI-generated case filings that have not been reviewed by a human, noting that:
Generative artificial intelligence platforms in their current states are prone to hallucinations and bias. On hallucinations, they make stuff up—even quotes and citations. Another issue is reliability or bias. While attorneys swear an oath to set aside their personal prejudices, biases, and beliefs to faithfully uphold the law and represent their clients, generative artificial intelligence is the product of programming devised by humans who did not have to swear such an oath. As such, these systems hold no allegiance to any client, the rule of law, or the laws and Constitution of the United States (or, as addressed above, the truth). Unbound by any sense of duty, honor, or justice, such programs act according to computer code rather than conviction, based on programming rather than principle.
On June 23, judge P. Kevin Castel dismissed the Mata case and issued a 440,000 report written by Deloitte and submitted to the Australian government in July. The company later submitted a revised report with these errors removed, and will issue a partial refund to the government. The following month, in November 2025, The Independent, a news publication in Newfoundland and Labrador, Canada, discovered that Deloitte's CA$1.6 million Health Human Resources Plan for the Government of Newfoundland and Labrador commissioned in May 2025 contained at least four false citations to non-existent research papers.
== In other modalities ==
The concept of "hallucination" is not limited to text generation, and can occur with other modalities. A confident response from any AI that seems erroneous by the training data can be labeled a hallucination.
=== Object detection === Various researchers cited by Wired have classified adversarial hallucinations as a high-dimensional statistical phenomenon, or have attributed hallucinations to insufficient training data. Some researchers believe that some "incorrect" AI responses classified by humans as "hallucinations" in the case of object detection may in fact be justified by the training data, or even that an AI may be giving the "correct" answer that the human reviewers are failing to see. For example, an adversarial image that looks, to a human, like an ordinary image of a dog, may in fact be seen by the AI to contain tiny patterns that (in authentic images) would only appear when viewing a cat. The AI is detecting real-world visual patterns that humans are insensitive to. Wired noted in 2018 that, despite no recorded attacks "in the wild" (that is, outside of proof-of-concept attacks by researchers), there was "little dispute" that consumer gadgets, and systems such as automated driving, were susceptible to adversarial attacks that could cause AI to hallucinate. Examples included a stop sign rendered invisible to computer vision; an audio clip engineered to sound innocuous to humans, but that software transcribed as "evil dot com"; and an image of two men on skis, that Google Cloud Vision identified as 91% likely to be "a dog". However, these findings have been challenged by other researchers. For example, it was objected that the models can be biased towards superficial statistics, leading adversarial training to not be robust in real-world scenarios.
=== Text-to-audio generative AI === Text-to-audio generative AI – more narrowly known as text-to-speech (TTS) synthesis, depending on the modality – are known ...
(Article truncated for display)
Source
This content is sourced from Wikipedia, the free encyclopedia. Read full article on Wikipedia
Category
Artificial Intelligence - Computer Science