Enterprise RAG pipelines operate under constraints that fundamentally differ from prototypes: strict data governance, auditability requirements, multi-source retrieval across heterogeneous document stores, and latency SLAs that must hold at the 99th percentile. These best practices address the engineering challenges of deploying RAG systems that enterprise security and compliance teams approve.
Document Ingestion Architecture
Implement a Multi-Stage Ingestion Pipeline
Enterprise document corpuses include PDFs, Word documents, HTML pages, Confluence wikis, Slack threads, and proprietary formats. Build the ingestion pipeline as a series of idempotent stages:
Version Your Embeddings
When you change embedding models (moving from text-embedding-ada-002 to text-embedding-3-large), all existing vectors become incompatible. Maintain embedding version metadata:
Run the old and new collections in parallel during migration, routing queries to the new collection once re-indexing is complete. Never delete the old collection until you have verified retrieval quality on the new one.
Chunking Strategies That Work at Scale
Semantic Chunking Over Fixed-Size
Fixed-size chunking (500 tokens with 50 token overlap) is the most common approach but produces poor results on structured enterprise documents where section boundaries carry meaning:
Include Parent-Child Context
Each chunk should carry enough context to be understood independently. Attach the parent section header and document title:
Retrieval Architecture
Hybrid Search: Dense + Sparse
Pure vector search misses exact keyword matches. Pure keyword search misses semantic similarity. Combine both:
Implement Query Expansion
Enterprise queries are often ambiguous or use domain-specific terminology. Expand queries before retrieval:
Access Control and Data Governance
Document-Level Permissions
Enterprise RAG systems must respect existing access control. Filter retrieval results based on the requesting user's permissions:
Audit Logging
Every RAG interaction must be logged for compliance:
Need a second opinion on your AI systems architecture?
I run free 30-minute strategy calls for engineering teams tackling this exact problem.
Book a Free CallEvaluation and Monitoring
Automated Retrieval Quality Metrics
Track retrieval quality continuously, not just during development:
Checklist
- Multi-format document parser (PDF, Word, HTML, Markdown)
- Semantic chunking with section-aware boundaries
- Embedding versioning with migration support
- Hybrid search (dense + sparse retrieval)
- Document-level access control filtering
- Audit logging for all RAG interactions
- Query expansion for ambiguous queries
- Retrieval quality monitoring with MRR and precision metrics
- Hallucination detection in generated responses
- Source citation in every generated response
- Rate limiting per user/department
- PII detection and redaction in ingested documents
Anti-Patterns to Avoid
Single embedding model dependency: Lock-in to one embedding provider makes migration painful. Abstract the embedding interface and maintain compatibility metadata per vector collection.
Ignoring chunk boundaries in responses: The LLM should cite specific chunks in its response. Without source attribution, enterprise users cannot verify the answer against the original document.
Over-retrieving without re-ranking: Retrieving 50 chunks and stuffing them into the prompt wastes tokens and degrades response quality. Retrieve broadly, re-rank with a cross-encoder, then pass the top 5-8 most relevant chunks to the LLM.
Skipping permission filtering for performance: Filtering after retrieval is slower but correct. Pre-filtering by embedding user permissions into the vector query is faster but creates security risks when permissions change and the index is not immediately updated.
Conclusion
Enterprise RAG pipelines require engineering rigor beyond what research prototypes demonstrate. The retrieval component is only as good as the ingestion pipeline feeding it — semantic chunking, embedding versioning, and multi-format parsing determine the ceiling of retrieval quality. Access control and audit logging are not optional features but prerequisites for enterprise deployment.
Invest in evaluation infrastructure from the start. Automated retrieval quality metrics (MRR, precision@k) and LLM-judged response relevance scores provide the feedback loop needed to iterate on chunking strategies, embedding models, and retrieval parameters. Without measurement, RAG pipeline improvements are guesswork.