Back to Journal
AI Architecture

AI Guardrails & Safety at Scale: Lessons from Production

Real-world lessons from implementing AI Guardrails & Safety in production, including architecture decisions, measurable results, and honest retrospectives.

Muneer Puthiya Purayil 14 min read

The Challenge

Business Context

We built a document intelligence platform for mid-market financial services firms — think regional banks and insurance companies processing contracts, compliance documents, and customer correspondence through LLM-based extraction and summarization pipelines. At peak, 40,000 documents per day, each triggering between 3 and 12 LLM calls depending on document type and complexity.

The business constraint was simple: our customers were regulated entities. If our system exposed one customer's document content to another customer's LLM context, or generated a summary that contradicted the actual document in a legally significant way, we would lose the customer and potentially face regulatory action alongside them. Guardrails weren't a feature — they were the product.

We had 6 months from initial deployment to demonstrate enough control over AI output quality and safety to satisfy two enterprise customer security audits. This is the honest account of what we built, what broke, and what we'd change.

Technical Constraints

The stack we inherited: Python FastAPI services on AWS ECS, PostgreSQL (RDS) for job state, S3 for document storage, and an early integration with OpenAI's GPT-4 (before gpt-4o existed). The team was 4 engineers total — no dedicated ML or AI safety specialist.

Non-negotiable constraints:

  • Latency SLA: Document processing must complete within 90 seconds end-to-end for 95th percentile
  • Multi-tenant isolation: Zero tolerance for cross-tenant data leakage. Each financial firm's documents are logically isolated.
  • Audit trail: Every LLM call must be logged in immutable, queryable storage for compliance audits
  • Availability: 99.9% uptime. Guardrail failures cannot take down the processing pipeline.

What we didn't have: a dedicated vector database (we used PostgreSQL pgvector), a dedicated safety classifier service, or any prior ML infrastructure.

Scale Requirements

Month 1: 500 documents/day. Month 3: 8,000 documents/day. Month 6: 40,000 documents/day. We had to build a system that worked at Month 1 scale without requiring a rewrite at Month 6 scale.

The growth curve was the main architectural challenge. A guardrail that adds 500ms synchronously is fine at 500 requests/day. At 40,000 requests/day distributed across business hours, that same synchronous check becomes a bottleneck that requires either optimization or a fundamental redesign.

Architecture Decision

Options Evaluated

We evaluated three guardrail architectures before settling on our final approach:

Option A: Fully synchronous, in-process guardrails. Every check runs in the FastAPI request handler before LLM calls. Simple, easy to reason about, no additional infrastructure. Problem: adding multiple checks at 200-500ms each would blow our 90-second SLA at scale.

Option B: Dedicated guardrail microservice. All checks go through a separate service with its own scaling. Clean separation of concerns, independently scalable. Problem: at Month 1 scale, this is two engineers maintaining infrastructure instead of building features. And we'd be adding an additional network hop to every document processing call.

Option C: Layered async pipeline with synchronous blocking layer. Synchronous checks only for the highest-risk categories (PII leakage, cross-tenant data isolation). All other checks run async post-processing with a separate human review queue for flagged items.

We chose Option C.

Decision Criteria

The key insight was that not all guardrail failures have the same consequence:

Failure typeConsequenceRequired: block or flag?
Cross-tenant data leakImmediate regulatory/legal incidentBlock synchronously
PII in outputCompliance violation, customer complaintBlock synchronously
Hallucination in summaryPoor product quality, customer complaintFlag async, human review
Inconsistency across documentsProduct quality issueFlag async, weekly review
Prompt injection attemptSecurity incidentBlock synchronously

Only three categories required synchronous blocking. Everything else could be async with a review queue. This let us keep the synchronous blocking layer thin and fast (<100ms total) while still catching quality issues post-hoc.

Final Architecture

1Document Upload → S3
2
3 Processing Queue (SQS)
4
5 [ECS Worker]
6
7 [Synchronous Guardrail Layer]80-100ms
8 - Tenant isolation check
9 - Input PII detection
10 - Prompt injection detection
11
12 LLM Calls (GPT-4) ← 10-45 seconds
13
14 [Synchronous Output Check]50-80ms
15 - Output PII scan
16 - Cross-tenant reference scan
17 - Length/format validation
18
19 Result → PostgreSQL (job state)
20
21 [Async Quality Checks] (separate Lambda)
22 - Hallucination detection
23 - Consistency check across document fields
24 - Confidence scoring
25
26 Human Review Queue (if flagged)
27 

Immutable audit log: every LLM call logged to CloudWatch Logs with a 7-year retention policy (regulatory requirement), indexed in OpenSearch for queries.

Implementation

Phase 1: Foundation

The synchronous guardrail layer. This had to be fast and reliable because it ran on every document, in the critical path.

python
1# guardrails/synchronous.py
2import hashlib
3import re
4import time
5from dataclasses import dataclass
6from enum import Enum
7import boto3
8import structlog
9 
10log = structlog.get_logger()
11 
12class GuardrailDecision(Enum):
13 ALLOW = "allow"
14 BLOCK = "block"
15 REDACT = "redact"
16 
17@dataclass
18class GuardrailResult:
19 decision: GuardrailDecision
20 reason: str | None
21 redacted_content: str | None
22 latency_ms: int
23 checks_run: list[str]
24 
25class SynchronousGuardrailLayer:
26 def __init__(self, tenant_id: str, comprehend_client):
27 self.tenant_id = tenant_id
28 self.comprehend = comprehend_client
29 
30 async def check_input(self, text: str, document_id: str) -> GuardrailResult:
31 start = time.monotonic()
32 checks_run = []
33 
34 # 1. Tenant isolation — does this document reference data from another tenant?
35 checks_run.append("tenant_isolation")
36 if self._contains_cross_tenant_reference(text):
37 return self._result(
38 GuardrailDecision.BLOCK, "cross_tenant_reference",
39 start, checks_run
40 )
41 
42 # 2. Prompt injection
43 checks_run.append("prompt_injection")
44 if self._contains_injection_pattern(text):
45 return self._result(
46 GuardrailDecision.BLOCK, "prompt_injection",
47 start, checks_run
48 )
49 
50 # 3. PII detection via AWS Comprehend
51 checks_run.append("pii_detection")
52 redacted, pii_found = await self._detect_and_redact_pii(text)
53 if pii_found:
54 # Don't block — redact and continue. Log the redaction.
55 log.info("pii_redacted", document_id=document_id, tenant_id=self.tenant_id)
56 return GuardrailResult(
57 decision=GuardrailDecision.REDACT,
58 reason="pii_detected",
59 redacted_content=redacted,
60 latency_ms=int((time.monotonic() - start) * 1000),
61 checks_run=checks_run,
62 )
63 
64 return self._result(GuardrailDecision.ALLOW, None, start, checks_run)
65 
66 def _contains_cross_tenant_reference(self, text: str) -> bool:
67 # Each tenant's documents are tagged with a tenant-specific prefix
68 # in our internal references. Check that this document doesn't
69 # reference another tenant's prefix.
70 # Implementation: regex match on tenant ID patterns in the extracted text
71 other_tenant_pattern = r'TEN-(?!{})([A-Z0-9]{{8}})'.format(
72 re.escape(self.tenant_id)
73 )
74 return bool(re.search(other_tenant_pattern, text))
75 
76 def _contains_injection_pattern(self, text: str) -> bool:
77 patterns = [
78 r'ignore\s+(all\s+)?previous\s+instructions',
79 r'you\s+are\s+now\s+a',
80 r'disregard\s+your\s+(system\s+)?prompt',
81 r'act\s+as\s+(if\s+you\s+are|a)',
82 r'<\s*system\s*>', # XML injection attempt
83 ]
84 text_lower = text.lower()
85 return any(re.search(p, text_lower) for p in patterns)
86 
87 async def _detect_and_redact_pii(self, text: str) -> tuple[str, bool]:
88 response = self.comprehend.detect_pii_entities(Text=text[:4900], LanguageCode='en')
89 entities = response.get('Entities', [])
90 pii_types_to_redact = {'SSN', 'CREDIT_DEBIT_NUMBER', 'BANK_ACCOUNT_NUMBER', 'PASSWORD'}
91 to_redact = [e for e in entities if e['Type'] in pii_types_to_redact and e['Score'] > 0.9]
92 
93 if not to_redact:
94 return text, False
95 
96 # Redact in reverse order to preserve character positions
97 result = text
98 for entity in sorted(to_redact, key=lambda e: e['BeginOffset'], reverse=True):
99 result = (
100 result[:entity['BeginOffset']] +
101 f"[REDACTED_{entity['Type']}]" +
102 result[entity['EndOffset']:]
103 )
104 return result, True
105 
106 def _result(self, decision, reason, start, checks_run):
107 return GuardrailResult(
108 decision=decision,
109 reason=reason,
110 redacted_content=None,
111 latency_ms=int((time.monotonic() - start) * 1000),
112 checks_run=checks_run,
113 )
114 

Phase 2: Core Features

The async quality pipeline ran as a Lambda function triggered after the synchronous processing completed. Its job was hallucination detection and consistency checking:

python
1# guardrails/async_quality.py
2import json
3from openai import OpenAI
4 
5client = OpenAI()
6 
7HALLUCINATION_CHECK_PROMPT = """You are a document verification specialist.
8 
9You will be given:
101. The original document text (SOURCE)
112. An AI-generated summary or extraction (OUTPUT)
12 
13Your job: identify any claims in OUTPUT that are not supported by SOURCE,
14or that contradict SOURCE.
15 
16Respond with JSON:
17{
18 "has_unsupported_claims": boolean,
19 "unsupported_claims": [
20 {"claim": "...", "issue": "not found in source | contradicts source"}
21 ],
22 "confidence": 0.0 to 1.0
23}
24 
25Be precise. Financial documents require exact accuracy."""
26 
27async def check_hallucination(
28 source_text: str,
29 llm_output: str,
30 document_id: str,
31) -> dict:
32 response = client.chat.completions.create(
33 model="gpt-4o", # Use a capable model for verification
34 messages=[
35 {"role": "system", "content": HALLUCINATION_CHECK_PROMPT},
36 {"role": "user", "content": f"SOURCE:\n{source_text[:6000]}\n\nOUTPUT:\n{llm_output}"},
37 ],
38 temperature=0,
39 response_format={"type": "json_object"},
40 )
41 
42 result = json.loads(response.choices[0].message.content)
43 
44 if result.get("has_unsupported_claims") and result.get("confidence", 0) > 0.8:
45 # Flag for human review — don't automatically block (too many false positives)
46 await add_to_review_queue(document_id, result["unsupported_claims"])
47 
48 return result
49 

Phase 3: Optimization

By month 4, with 20,000 documents/day, we had three concrete performance problems:

Problem 1: AWS Comprehend PII detection was adding 150-300ms per call. At scale, this was significant.

Fix: Cache Comprehend results by a hash of the input text segment. Documents in financial services are often templated — the same boilerplate appears across thousands of documents. Cache hit rate reached 34% within a week.

python
1import hashlib
2import json
3import redis.asyncio as aioredis
4 
5redis = aioredis.from_url("redis://...")
6 
7async def cached_pii_check(text: str) -> tuple[str, bool]:
8 cache_key = f"pii:{hashlib.sha256(text.encode()).hexdigest()}"
9 cached = await redis.get(cache_key)
10 if cached:
11 data = json.loads(cached)
12 return data["redacted"], data["found"]
13 
14 redacted, found = await _detect_and_redact_pii_uncached(text)
15 await redis.setex(cache_key, 3600, json.dumps({"redacted": redacted, "found": found}))
16 return redacted, found
17 

Problem 2: Hallucination check via LLM was expensive — $0.04 per document when checking every output. At 40,000 documents/day, that's $1,600/day on quality checks alone.

Fix: Risk-tiered checking. High-risk document types (contracts, compliance filings) checked every output. Low-risk types (routine correspondence) sampled at 10%. Reduced hallucination check costs by 78% with no detectable change in quality incident rate.

Problem 3: The human review queue was backlogging. Reviewers couldn't keep up.

Fix: Added a confidence-based routing layer. Items with hallucination confidence > 0.95 were auto-rejected (re-processed). Items with confidence 0.7-0.95 went to human review. Items below 0.7 were auto-approved (classifier uncertainty, not actual hallucination).

Need a second opinion on your AI systems architecture?

I run free 30-minute strategy calls for engineering teams tackling this exact problem.

Book a Free Call

Results & Metrics

Performance Gains

After 6 months of iteration:

  • Synchronous guardrail latency: 82ms median (down from 340ms at launch, before caching)
  • End-to-end document processing: 38 seconds median, 71 seconds p95 (well within the 90s SLA)
  • Cross-tenant data leak incidents: 0 in production
  • Prompt injection attempts blocked: 847 over 6 months (0.003% of total requests — lower than expected)

The hallucination detection system flagged 2.3% of all document outputs for human review. Of those flagged, 61% were confirmed as having at least one unsupported claim. That's a meaningful catch rate for a financial services context.

Cost Impact

ComponentMonth 1 costMonth 6 costNotes
AWS Comprehend PII$12$1,840Volume-driven, caching helped
Hallucination checks (LLM)$31$1,620Risk-tiered reduced from $6,200 est.
Redis cache$0$18034% cache hit rate offset Comprehend costs
CloudWatch Logs (audit)$8$890Regulatory requirement, non-negotiable

Total guardrail cost at 40,000 docs/day: ~$4,530/month, approximately 18% of total infrastructure costs. We considered this acceptable for the risk management function it provided.

Developer Productivity

Unexpected finding: the guardrail system accelerated development of new AI features, not slowed it. Because we had a standardized interface for adding guardrails, engineers could ship new LLM features with confidence that the safety layer handled the common risks. The pre-launch checklist went from "fill this out before the security team will approve" to "oh right, I should check these 6 things" — a sign that it had become part of the workflow rather than overhead.

Lessons Learned

What Worked

Risk tiering was the right call. Treating hallucination as "flag and review" rather than "block synchronously" was the decision that made the system viable. 100% synchronous blocking of uncertain outputs would have required the review queue to handle thousands of items per day — economically and operationally impossible for a 4-person team.

AWS Comprehend for PII was the right trade-off. We evaluated building our own PII classifier. At startup scale, Comprehend's accuracy (92% precision, 88% recall on our financial document corpus) was sufficient and saved 6 weeks of ML work. We would revisit this at 3x scale when the cost becomes a real line item.

Immutable audit logs paid dividends. One enterprise customer requested a compliance audit 3 months after contract signing. We pulled a complete, tamper-evident log of every LLM call for their document portfolio in 4 hours. That capability closed a deal with a second customer who was watching the audit closely.

What Surprised Us

Prompt injection attempts were rarer than expected. We anticipated this as the primary attack vector. In practice, the injection patterns we blocked came mostly from poorly formatted customer documents that happened to contain phrases like "ignore previous" in legal disclaimers — false positives, not attacks. We tuned the patterns down significantly after the first month.

Hallucination was correlated with document quality, not model behavior. Our hallucination detector flagged outputs on documents that were scanned PDFs with poor OCR quality. The model was doing its best with ambiguous source text. We added an input quality check (OCR confidence score) that caught these before LLM processing, reducing flagged hallucinations by 40%.

Cache invalidation for PII patterns was an edge case we missed. When we updated our PII detection patterns, stale cache entries using old patterns persisted for an hour. For a one-hour window, documents processed from cache used the outdated detection logic. We added cache versioning and a pattern change procedure.

Key Takeaways

  1. Ship the synchronous blocking layer first. It's the only one that prevents incidents. Everything else is quality assurance.

  2. Risk-tier your checks. Not all safety failures have the same consequence. Reserve synchronous blocking for the ones that would be immediately harmful.

  3. Build the audit log before you need it. It will be asked for at the worst possible time (a customer audit, an incident investigation). The cost of building it retroactively under time pressure is high.

  4. Measure false positive rates weekly. Guardrails that over-block are a product quality problem, not just a safety nuance. Every blocked legitimate document erodes customer trust.

What We'd Do Differently

Architecture Changes

Introduce a dedicated guardrail service earlier. We kept guardrail logic in the main processing service for the first 4 months. By the time we extracted it, we had enough interdependencies that the extraction took 2 weeks instead of 2 days. A shared Python package with a stable interface, deployed to a separate service, would have given us independent scaling from month 2.

Use a purpose-built LLM observation platform. We logged everything to CloudWatch, which worked fine for compliance but made debugging difficult. Platforms like Langfuse or Helicone would have given us cost breakdowns by document type, latency percentiles by guardrail, and per-tenant usage — all of which we ended up building manually in Grafana.

Build the confidence-based auto-routing from day 1. We built the human review queue and manually watched it grow for two months before we added automated routing. We should have designed the routing logic first and added human review as the fallback tier, not the primary tier.

Process Improvements

Establish a "guardrail regression" test suite before launch. When we changed PII patterns or injection detection heuristics, we had no automated way to check that existing safe inputs were still passing. We caught two regressions via customer complaints before we built regression tests. This should have been standard from the first deploy.

Create a shared "known-bad" document corpus for testing. We accumulated adversarial examples over 6 months organically (from real incidents and injection attempts). Having this corpus from month 1 — even with synthetic data — would have given us a more rigorous baseline for guardrail quality.

Conclusion

Building production-grade AI guardrails for regulated industries is fundamentally an exercise in risk stratification. The layered architecture — synchronous blocking for high-severity failures like cross-tenant leakage and PII exposure, async flagging for quality issues like hallucination and inconsistency — let us keep the critical path fast while still catching the long tail of quality problems. The 80-100ms synchronous layer was the key design constraint that made the system viable at 40,000 documents per day without requiring a dedicated guardrail microservice.

The most valuable takeaway is that guardrail systems are living infrastructure, not set-and-forget rules. Our false positive rates shifted significantly as document types changed and LLM model versions updated. Invest early in observability and feedback loops — immutable audit logs, dashboards tracking block rates by category, and a human review queue that feeds corrections back into your classifiers. Start with the simplest check that addresses your highest-risk failure mode, measure its real-world performance, and add complexity only when the data justifies it.

FAQ

Need expert help?

Building with agentic AI?

I help teams ship production-grade systems. From architecture review to hands-on builds.

Muneer Puthiya Purayil

SaaS Architect & AI Systems Engineer. 10+ years shipping production infrastructure across fintech, automotive, e-commerce, and healthcare.

Engage

Start a
Conversation.

For teams building at scale: SaaS platforms, agentic AI systems, and enterprise mobile infrastructure. Scope and fit are evaluated before any engagement begins.

Limited availability · Q3 / Q4 2026