Why most RAG implementations fail

Fri, 24 Apr 2026 00:00:00 +0000

The gap between a RAG proof-of-concept and a production system that actually works is wider than most teams expect.

In my experience, the failures tend to cluster around the same handful of mistakes.

The retrieval unit is wrong

Fixed-size chunking is the default because it’s easy — split every 500 tokens, done. But documents don’t respect token boundaries. A chunk that cuts a table in half, or separates a heading from its body, produces retrievals that are syntactically present but semantically useless.

AI on Mansoor

Why most RAG implementations fail

The retrieval unit is wrong