Analysis

The Value of GenAI Hallucinations

To harness GenAI’s potential, we need it to be both accurate and creative — but is that possible?

Written by Howard Poston | 8 min • April 11, 2025

Life Sciences Company Wrangled Their Data Sprawl

Generative artificial intelligence has been tasked with a nuanced problem: Users expect it to be accurate and reliable but also to “think” for itself. When a GenAI model makes an error, we dub it an undesirable “hallucination.” But even when it’s accurate we complain that it “sounds like AI” or lacks creative spark.

Currently, AI’s capacity to strike this balance is limited. And if we can’t trust an AI system to consistently output the “right” answer, its potential uses are limited, too. But a model that’s completely predictable leaves little room for innovation or invention — the stuff that makes GenAI so compelling and transformative.

So can GenAI be both accurate and creative? Prompt engineering — fine-tuning a query to a particular tool — can be powerful. But the solution may lie in getting AI to think in an entirely different way.

How Hallucinations Happen

Hallucinations are the product of GenAI’s high creativity. If a piece of information doesn’t exist in its training data, it makes something up based on its understanding of human knowledge.

Hallucinations can take the form of incorrect predictions (e.g., stating it will rain the next day when no rain is forecast), false positives (e.g., flagging a transaction as fraudulent when it isn’t) or false negatives (e.g. failing to detect cancer when it’s present).

Over time, AI systems learn to pick out the signal from the noise, grasping the patterns and features needed to accomplish a given task. In the case of GenAI, the task at hand is understanding how humans think, speak and convey knowledge on just about any topic in the universe.

Understandably, with that scale of data and unpredictability, mistakes — or hallucinations — can happen.

“These systems have a deep understanding of reality that is derived from the human race’s understanding,” says Andrew Leker, chief technology officer and president of Bambu AI. “If an AI model hallucinates, it’s likely because it’s being asked a question whose answer wasn’t in the training data.”

Therein lies the catalyst of AI hallucinations: An AI is asked something it doesn’t know, so it invents an answer. In humans, we call that the Dunning-Kruger effect, in which people with limited knowledge on a subject overestimate their abilities.

Ready for AI?

How is AI changing business and what urgent steps must you take to prepare? Find out in our report.

Read Now

When a user issues a prompt to a GenAI system, the input is broken into words or small phrases called tokens. These tokens pass through the AI model, and new tokens are generated one at a time at the output side, based on the most likely next correct token for the output.

For example, if you ask ChatGPT to write a poem about a tree, the tokens “tree” and “poem” help the model understand what you want and shape its output. Without further guidance, you could end up with a haiku or an epic poem. But if you specify you want a haiku, that’s exactly what you’ll get. One word, or token, is the difference between a brief poem and 20 stanzas about photosynthesis.

The words we choose are often also the catalyst for things going wrong — or getting weird — very quickly. If an AI begins a conversation by going down the wrong path, it may never recover, akin to making a verbal faux pas in conversation and being unable to untangle yourself from it.

“Once, when an AI tool said something I didn’t understand, I consulted a linguist I know,” says Leker. “She said the reason was that I used a ‘branch’ word — I think it was ‘also’ — which split the potential output into two potential paths. The AI tool selected a token that was reasonable but sent the system into a bad place that it struggled to get out of.”

That’s the thing about creativity — you never know what you’re going to get. The key is knowing how to use it to your advantage.

Harnessing Creativity

AI demonstrates creativity primarily through parameters: probability and temperature.

When selecting the next output token, a GenAI model will produce a set of options and their probabilities. The number of options to consider is defined by a parameter called “top P.” If “top P” is equal to one, the model will always choose the most probable next token. As a result, it will always produce the same response to the same prompt — making it predictable, but not creative.

You can use temperature to adjust the probabilities associated with each potential output. A high temperature value “flattens” the probability distribution, making less-probable options more likely and more-probable ones less likely. This helps to enhance creativity since the model is more likely to choose an unusual token for the next piece of its output.

With a temperature of zero, the model will always output the same response to the same prompt. A temperature of one (the maximum option) will produce highly creative outputs, but they’re also likely to be incoherent and nonsensical.

But AI creativity isn’t just a function of temperature. Fine-tuning models opens up new creative pathways. Leker uses fine-tuning to instill creativity — or other desirable attributes — in his models. He interacts with LLMs and provides examples of his desired outputs to particular prompts. Over time, this updates the weights of the AI’s internal model, effectively changing the way it thinks.

“If I fine-tune a model, I can set the temperature to zero, and it will still produce very creative outputs,” Leker says. “However, the output produced by a particular prompt will be the same every time.”

Increasing the temperature value only provides the model with a bit more creative freedom when selecting a response. It’s similar to someone thinking through their answer to a question rather than choosing the first words that pop into their head — no matter how hard they try, they may never make connections that would be intuitive and obvious to someone else.

"Whether through temperature, fine-tuning or prompt engineering — or a combination of all three — GenAI’s creativity can be controlled. "

Whether through temperature, fine-tuning or prompt engineering — or a combination of all three — GenAI’s creativity can be controlled. But as with any tool, maximizing GenAI’s potential requires using it correctly.

Asking the Right Questions

GenAI has reached the point where, if we’re encountering hallucinations, it may be because we’re not asking the right questions.

Prescriptive prompts can make a GenAI system’s outputs more reliable by limiting the data it uses to produce an answer, says Philip Magnuszewski, chief innovation officer and applied AI strategist at Infused Innovations, Inc.

“These AI models have been trained using much of the information that has been produced by humanity across its history,” says Magnuszewski. “This is more data than any individual has been privy to or could comprehend at this scale.”

A tightly-defined prompt signals to an AI system that most of that data is noise. It also makes it easier to identify the signal.

But other times, Magnuszewski says, a deeper reservoir of knowledge can fuel more creativity.

“When brainstorming, I deliberately use less restrictive prompting to make things more open-ended, since I’m really interested in what it will come back with,” he says. “For example, when asked about the potential effects of GenAI on the world, it started describing how economic models of society may evolve based on how we use the technology.”

For individuals and enterprises wanting to get the most out of GenAI, Leker and Magnuszewski suggest first considering the following questions:

Is this a good use case for GenAI? Whether due to incomplete training data or poor prompting, GenAI makes mistakes. It’s always important to consider the potential impacts of these errors and whether they’re worth the risk.
What should I ask? When prompting a GenAI system, the accuracy of the output may depend on a single token. Understanding the desired outcome — a fixed result or open-ended brainstorming — is vital to designing the right system or prompt.
How much tailoring is necessary? In some cases, an off-the-shelf AI model — potentially with a modified temperature — may be fit for purpose. In others, a more fine-tuned model may be necessary to provide the best possible answers.

GenAI is an amalgam — formed by compressing the majority of human knowledge and expression into a “simple” model. When we interact with a LLM chatbot, the output is defined by our prompt — down to our particular choice of words — and its understanding of the world, which is based on human writing and art.

If we don’t like its opinions or think it’s uncreative, what does that say about us?

Analysis
AI

Howard Poston

Contributor

Howard Poston is a freelance copywriter and consultant specializing in cybersecurity and blockchain. He's created over a dozen textbooks, authored Python for Cybersecurity and Blockchain Security From the Bottom Up, and spoken at numerous security conferences.

The Array is a thought leadership publication curated by Hitachi Vantara, dedicated to exploring the intersection of data, technology, innovation, and leadership. The views, thoughts, and opinions expressed on this website are solely those of the individual authors and do not necessarily reflect the official policy or position of Hitachi Vantara or its affiliates.

Which topics do you want to hear about?