Here’s the thing about AI: we’re putting a lot of effort into making sure it doesn’t do silly things. Most of our effort around AI seems to be put toward controlling it.
A “hallucination” is when an LLM generates text that sounds correct, but isn’t. LLMs are eager to generate text, and they will always do so unless told otherwise.
Almost hilariously, the most cost effective way to reduce hallucinations might be to simply add something like this to the prompt:
If you do not have the correct information, respond that you cannot answer.
LLMs need to be told that it’s okay not to answer (or, ironically, they need to be told how to answer when they shouldn’t answer…)
Beyond that, this book goes deep into “pipelines” of processes and guardrails that prevent LLMs from hallucinating. The options presented in this book are at the extreme, but that’s kind of point – you need to decide what your particular situation requires.
For a research project I’m doing, I created a diagram of the major concepts here.
A list of things you can do –
Redirect the prompt away from AI entirely. If someone inputs “What is the current temperature,” just redirect that to a service that provides it. Same for stock quotes. Same for simple math.
Modify the prompt with safety guidelines and instructions (as noted above)
Use Retrieval Augmented Generation (RAG) to supply correct and updated information with the prompt. This gets a little weird, because you’re essentially providing the answer via simple search, and just depending on the LLM to form the answer. The lines between an LLM and a search engine get blurry.
When using RAG, “chunking” size matters. You should experiment to see what size of chunk and what level of overlap for your content domain gives you the best results.
Use multiple LLMs to generate the answer, then use an LLM to compare them. How different are they?
Force the result into a specific output structure (a JSON schema) which can then be parsed for further verification.
Use various forms of post-answer verification. The author provides several options that I didn’t know existed, but it seems to boil down to parsing out facts and nouns and verifying them.
Instead of answering, have the LLM form a function call that you execute to get the answer in an deterministic way.
Have a “human in the loop” that reviews the answer.
The book is comprehensive. I’m not sure I’ll remember everything (and there are some very brief code snippets that didn’t do much except “prove” there are code solutions), but it emphasized to me the limits to LLMs and extent we need to go to control them.
The author apparently has considerable experience in high-stakes LLM usage: medical and legal advice, it seems.
Book Info
Author
Darryl Jeffery
Year
Pages
158
Acquired
I have read this book. According to my records, I completed it on December 22, 2025.
Here’s the bizarre thing about RAG: it basically flips what you thought you understood about AI on its head, and beckons you to understand it more accurately. LLMs are static things. They are trained at a point in time, and cannot be “updated” in the sense that you’re probably thinking. An LLM…