Back to blog
AI Engineering

RAG in the Real World: Retrieval That Doesn't Hallucinate

Retrieval-augmented generation is simple to start and brutal to get right. A practical look at chunking, ranking, and the failure modes nobody warns you about.

Aarav Mehta
Aarav Mehta · 9 min read
RAG in the Real World: Retrieval That Doesn't Hallucinate

Retrieval-augmented generation has a reputation for being easy: embed your documents, search them, stuff the results into a prompt, done. That version works in a weekend and breaks in production. The hard part of RAG isn't the retrieval — it's making sure the model gets the right context, in the right shape, at the right time.

Garbage retrieval, garbage answers

An LLM can only reason over what you hand it. If retrieval surfaces the wrong passages, the model will faithfully build a wrong answer on top of them — and sound completely confident doing it. Most "the AI hallucinated" complaints are actually retrieval failures wearing a disguise. Fix retrieval and a surprising share of hallucinations disappear.

Chunking is a design decision

How you split documents determines what's retrievable. Chunks that are too big drown the signal in noise; chunks that are too small lose the context that makes a passage meaningful. Respect the structure of the source — split on sections and semantic boundaries, not arbitrary character counts — and keep enough overlap that an idea is never cut in half.

Search is not one step

Pure vector similarity gets you 70% of the way and then plateaus. The systems that hold up combine approaches: keyword search to catch exact terms and names, vector search for meaning, and a reranking pass to put the genuinely relevant results on top. Retrieval quality is a pipeline, not a single embedding call.

  • Hybrid search — combine keyword and semantic retrieval, don't pick one
  • Rerank the top candidates before they ever reach the model
  • Attach metadata (source, date, section) and filter on it
  • Return citations so every claim is traceable to a chunk

The failure modes nobody mentions

Real corpora are messy. Documents go stale, duplicates pile up, and the most-cited page is often outdated. Without freshness signals, your RAG system will happily quote a policy that changed last year. Build in recency weighting, deduplication, and a way to retire old content — otherwise retrieval quietly rots while the demo still looks fine.

RAG doesn't fail loudly. It fails by confidently retrieving yesterday's truth — which is why evaluation matters more than the model you pick.

Measure, then improve

You can't improve retrieval you don't measure. Build a set of representative questions with the passages that should be retrieved, and track whether they actually surface. Once retrieval is measurable, every change becomes an experiment with a clear answer instead of a vibe — and that's the foundation we insist on before any RAG system goes live.

RAGRetrievalVector SearchLLM
Aarav Mehta
Aarav MehtaAI Engineering Lead · Atyuttama