Here’s the bizarre thing about RAG: it basically flips what you thought you understood about AI on its head, and beckons you to understand it more accurately.
LLMs are static things. They are trained at a point in time, and cannot be “updated” in the sense that you’re probably thinking. An LLM cannot be incrementally trained. If you want to give it new information, you have to re-train it, and training an LLM this takes a long time and is expensive. It can cost hundreds of millions of dollars to build and train an LLM (see this)
So, how do you get an LLM to –
Respond (“know”) about information that happened after it was trained?
Response (“know”) anything about your particular business, especially internal information which could not have been in its training data?
You do this with Retrieval Augmented Generation (RAG). This is a method by which you take the user’s prompt, search for information that might answer it, then pass that information into the LLM with the prompt.
To be clear: RAG really has nothing to do with the LLM. The user’s prompt is used to find information – the user searches the RAG database without knowing it, then all that information is fed into the LLM. By the time the LLM starts its work, the RAG part is all done.
(See this diagram that I put together from this book. Note where the RAG parts are and where they intersect with the main flow that starts with Boromir and ends with Ron Burgundy.)
This seemed weird to me. If you can find the information in the RAG database, why not just give it to the user?
Well, you can (this book even says that you should in some cases), but LLMs are very good at generating text. They can sort through all sorts of inputs and generate something intelligent from them. So are giving the LLM the answer along with the question, and saying, do what you’re good at: make this answer…pretty.
And this helps frame what LLMs really do: they generatively predict text. That’s it. They don’t “know” anything, they just forecast and select from a bunch of probabilities. (A friend called LLMs “spicy autocomplete.”)
RAG is fundamentally about search and information retrieval – tokenization and vector storage and such. This book did a nice job of explaining the “RAG pipeline,” but I couldn’t help but think, “This is just the same search stuff we’ve been doing for years.” There are some idiosyncrasies – chunking and and overlap, for example – but mostly, you’re pushing content into a search index and producing results. You’re just not formatting or presenting the results: you’re leaning on the LLM for that.
Weirdness of RAG aside, this was good book. Like most Manning titles, it was clear and went to the depth it needed to.
Book Info
Author
Abhinav Kimothi
Year
Pages
444
Acquired
I have read this book. According to my records, I completed it on December 23, 2025.
Here’s the thing about AI: we’re putting a lot of effort into making sure it doesn’t do silly things. Most of our effort around AI seems to be put toward controlling it. A “hallucination” is when an LLM generates text that sounds correct, but isn’t. LLMs are eager to generate text, and they will…