Samwel Ngusa | ML & AI Engineer Portfolio

On this page

1.Introduction
2.Core Concepts
3.Implementation
4.Key Takeaways
5.Conclusion

Introduction

Most RAG tutorials end at the demo. You embed some documents, run a similarity search, pass the results to an LLM, and it answers questions. That works fine for a proof of concept. Getting it to work reliably on real enterprise data messy PDFs, mixed-format tables, policy documents written in 2003 is a different problem entirely.

Core Concepts

RAG has two failure modes that matter in production. The first is retrieval failure: the right chunks are not returned, so the LLM either hallucinates or says it does not know. The second is generation failure: the right chunks are returned but the LLM ignores them or misinterprets them.

Most teams focus exclusively on the generation side (swapping models, tweaking prompts) while ignoring retrieval. In my experience, 70% of RAG failures are retrieval failures.

Implementation

Chunking strategy is the most underrated lever in RAG. Fixed-size chunking (split every 512 tokens) is fast to implement and consistently poor in production. Use semantic chunking split on logical boundaries like headings, paragraphs, and section breaks. For structured documents, extract tables separately and index them with their context.

Hybrid retrieval (dense embeddings + sparse BM25) consistently outperforms pure vector search on enterprise data. The reason is that enterprise documents contain specific identifiers product codes, policy numbers, names that embeddings wash out but keyword search handles perfectly. ChromaDB supports hybrid retrieval; use it.

Observability is non-negotiable. Log every query, the chunks retrieved, their similarity scores, and the final answer. This gives you the data to identify the failure patterns and improve systematically.

Key Takeaways

Fix retrieval before optimizing generation. Measure retrieval recall on a golden dataset.
Use semantic chunking and preserve document structure tables, headers, and metadata.
Add hybrid retrieval (dense + sparse) for any domain with specific identifiers or jargon.
Log everything. You cannot improve what you cannot measure.

Conclusion

Enterprise RAG is a data engineering problem as much as it is an AI problem. The teams that succeed are the ones that treat document ingestion, chunking, and retrieval as first-class engineering concerns not as setup steps to rush through before getting to the "real" LLM work.

RAG

LangChain

ChromaDB

RAG in the Enterprise: Beyond the Basics

Introduction

Core Concepts

Implementation

Key Takeaways

Conclusion

Other posts you might like

Building Production-Ready Agentic AI Systems