What is AI Guardrails & Safety and why does it matter?

AI Guardrails & Safety is a critical architectural pattern for modern software systems. It matters because it directly impacts scalability, maintainability, and team velocity in production environments.

How does AWS compare for AI Guardrails & Safety?

AWS provides a mature set of managed services that reduce the implementation overhead for enterprise AI safety: Comprehend for PII and content classification, Bedrock Guardrails for managed LLM safety policies, CloudWatch for immutable audit logging, and Lambda for serverless async quality checks. The tradeoff is vendor lock-in and per-call pricing that becomes significant at scale.

What are common mistakes with AI Guardrails & Safety?

Common mistakes include premature optimization, insufficient observability, ignoring failure modes, and over-engineering the initial implementation. Start simple and iterate based on production data.

AI Guardrails & Safety at Scale: Lessons from Production

The Challenge

Business Context

We built a document intelligence platform for mid-market financial services firms — think regional banks and insurance companies processing contracts, compliance documents, and customer correspondence through LLM-based extraction and summarization pipelines. At peak, 40,000 documents per day, each triggering between 3 and 12 LLM calls depending on document type and complexity.

The business constraint was simple: our customers were regulated entities. If our system exposed one customer's document content to another customer's LLM context, or generated a summary that contradicted the actual document in a legally significant way, we would lose the customer and potentially face regulatory action alongside them. Guardrails weren't a feature — they were the product.

We had 6 months from initial deployment to demonstrate enough control over AI output quality and safety to satisfy two enterprise customer security audits. This is the honest account of what we built, what broke, and what we'd change.

Technical Constraints

The stack we inherited: Python FastAPI services on AWS ECS, PostgreSQL (RDS) for job state, S3 for document storage, and an early integration with OpenAI's GPT-4 (before gpt-4o existed). The team was 4 engineers total — no dedicated ML or AI safety specialist.

Non-negotiable constraints:

Latency SLA: Document processing must complete within 90 seconds end-to-end for 95th percentile
Multi-tenant isolation: Zero tolerance for cross-tenant data leakage. Each financial firm's documents are logically isolated.
Audit trail: Every LLM call must be logged in immutable, queryable storage for compliance audits
Availability: 99.9% uptime. Guardrail failures cannot take down the processing pipeline.

What we didn't have: a dedicated vector database (we used PostgreSQL pgvector), a dedicated safety classifier service, or any prior ML infrastructure.

Scale Requirements

Month 1: 500 documents/day. Month 3: 8,000 documents/day. Month 6: 40,000 documents/day. We had to build a system that worked at Month 1 scale without requiring a rewrite at Month 6 scale.

The growth curve was the main architectural challenge. A guardrail that adds 500ms synchronously is fine at 500 requests/day. At 40,000 requests/day distributed across business hours, that same synchronous check becomes a bottleneck that requires either optimization or a fundamental redesign.

Architecture Decision

Options Evaluated

We evaluated three guardrail architectures before settling on our final approach:

Option A: Fully synchronous, in-process guardrails. Every check runs in the FastAPI request handler before LLM calls. Simple, easy to reason about, no additional infrastructure. Problem: adding multiple checks at 200-500ms each would blow our 90-second SLA at scale.

Option B: Dedicated guardrail microservice. All checks go through a separate service with its own scaling. Clean separation of concerns, independently scalable. Problem: at Month 1 scale, this is two engineers maintaining infrastructure instead of building features. And we'd be adding an additional network hop to every document processing call.

Option C: Layered async pipeline with synchronous blocking layer. Synchronous checks only for the highest-risk categories (PII leakage, cross-tenant data isolation). All other checks run async post-processing with a separate human review queue for flagged items.

We chose Option C.

Decision Criteria

The key insight was that not all guardrail failures have the same consequence:

Failure type	Consequence	Required: block or flag?
Cross-tenant data leak	Immediate regulatory/legal incident	Block synchronously
PII in output	Compliance violation, customer complaint	Block synchronously
Hallucination in summary	Poor product quality, customer complaint	Flag async, human review
Inconsistency across documents	Product quality issue	Flag async, weekly review
Prompt injection attempt	Security incident	Block synchronously

Only three categories required synchronous blocking. Everything else could be async with a review queue. This let us keep the synchronous blocking layer thin and fast (<100ms total) while still catching quality issues post-hoc.

Final Architecture

1Document Upload → S3

2 ↓

3 Processing Queue (SQS)

4 ↓

5 [ECS Worker]

6 ↓

7 [Synchronous Guardrail Layer] ← 80-100ms

8 - Tenant isolation check

9 - Input PII detection

10 - Prompt injection detection

11 ↓

12 LLM Calls (GPT-4) ← 10-45 seconds

13 ↓

14 [Synchronous Output Check] ← 50-80ms

15 - Output PII scan

16 - Cross-tenant reference scan

17 - Length/format validation

18 ↓

19 Result → PostgreSQL (job state)

20 ↓

21 [Async Quality Checks] (separate Lambda)

22 - Hallucination detection

23 - Consistency check across document fields

24 - Confidence scoring

25 ↓

26 Human Review Queue (if flagged)

Immutable audit log: every LLM call logged to CloudWatch Logs with a 7-year retention policy (regulatory requirement), indexed in OpenSearch for queries.

Implementation

Phase 1: Foundation

The synchronous guardrail layer. This had to be fast and reliable because it ran on every document, in the critical path.

python

1# guardrails/synchronous.py

2import hashlib

3import re

4import time

5from dataclasses import dataclass

6from enum import Enum

7import boto3

8import structlog

10log = structlog.get_logger()

12class GuardrailDecision(Enum):

13 ALLOW = "allow"

14 BLOCK = "block"

15 REDACT = "redact"

17@dataclass

18class GuardrailResult:

19 decision: GuardrailDecision

20 reason: str | None

21 redacted_content: str | None

22 latency_ms: int

23 checks_run: list[str]

25class SynchronousGuardrailLayer:

26 def __init__(self, tenant_id: str, comprehend_client):

27 self.tenant_id = tenant_id

28 self.comprehend = comprehend_client

30 async def check_input(self, text: str, document_id: str) -> GuardrailResult:

31 start = time.monotonic()

32 checks_run = []

34 # 1. Tenant isolation — does this document reference data from another tenant?

35 checks_run.append("tenant_isolation")

36 if self._contains_cross_tenant_reference(text):

37 return self._result(

38 GuardrailDecision.BLOCK, "cross_tenant_reference",

39 start, checks_run

40 )

42 # 2. Prompt injection

43 checks_run.append("prompt_injection")

44 if self._contains_injection_pattern(text):

45 return self._result(

46 GuardrailDecision.BLOCK, "prompt_injection",

47 start, checks_run

48 )

50 # 3. PII detection via AWS Comprehend

51 checks_run.append("pii_detection")

52 redacted, pii_found = await self._detect_and_redact_pii(text)

53 if pii_found:

54 # Don't block — redact and continue. Log the redaction.

55 log.info("pii_redacted", document_id=document_id, tenant_id=self.tenant_id)

56 return GuardrailResult(

57 decision=GuardrailDecision.REDACT,

58 reason="pii_detected",

59 redacted_content=redacted,

60 latency_ms=int((time.monotonic() - start) * 1000),

61 checks_run=checks_run,

62 )

64 return self._result(GuardrailDecision.ALLOW, None, start, checks_run)

66 def _contains_cross_tenant_reference(self, text: str) -> bool:

67 # Each tenant's documents are tagged with a tenant-specific prefix

68 # in our internal references. Check that this document doesn't

69 # reference another tenant's prefix.

70 # Implementation: regex match on tenant ID patterns in the extracted text

71 other_tenant_pattern = r'TEN-(?!{})([A-Z0-9]{{8}})'.format(

72 re.escape(self.tenant_id)

73 )

74 return bool(re.search(other_tenant_pattern, text))

76 def _contains_injection_pattern(self, text: str) -> bool:

77 patterns = [

78 r'ignore\s+(all\s+)?previous\s+instructions',

79 r'you\s+are\s+now\s+a',

80 r'disregard\s+your\s+(system\s+)?prompt',

81 r'act\s+as\s+(if\s+you\s+are|a)',

82 r'<\s*system\s*>', # XML injection attempt

83 ]

84 text_lower = text.lower()

85 return any(re.search(p, text_lower) for p in patterns)

87 async def _detect_and_redact_pii(self, text: str) -> tuple[str, bool]:

88 response = self.comprehend.detect_pii_entities(Text=text[:4900], LanguageCode='en')

89 entities = response.get('Entities', [])

90 pii_types_to_redact = {'SSN', 'CREDIT_DEBIT_NUMBER', 'BANK_ACCOUNT_NUMBER', 'PASSWORD'}

91 to_redact = [e for e in entities if e['Type'] in pii_types_to_redact and e['Score'] > 0.9]

93 if not to_redact:

94 return text, False

96 # Redact in reverse order to preserve character positions

97 result = text

98 for entity in sorted(to_redact, key=lambda e: e['BeginOffset'], reverse=True):

99 result = (

100 result[:entity['BeginOffset']] +

101 f"[REDACTED_{entity['Type']}]" +

102 result[entity['EndOffset']:]

103 )

104 return result, True

105

106 def _result(self, decision, reason, start, checks_run):

107 return GuardrailResult(

108 decision=decision,

109 reason=reason,

110 redacted_content=None,

111 latency_ms=int((time.monotonic() - start) * 1000),

112 checks_run=checks_run,

113 )

114

Phase 2: Core Features

The async quality pipeline ran as a Lambda function triggered after the synchronous processing completed. Its job was hallucination detection and consistency checking:

python

1# guardrails/async_quality.py

2import json

3from openai import OpenAI

5client = OpenAI()

7HALLUCINATION_CHECK_PROMPT = """You are a document verification specialist.

9You will be given:

101. The original document text (SOURCE)

112. An AI-generated summary or extraction (OUTPUT)

13Your job: identify any claims in OUTPUT that are not supported by SOURCE,

14or that contradict SOURCE.

16Respond with JSON:

17{

18 "has_unsupported_claims": boolean,

19 "unsupported_claims": [

20 {"claim": "...", "issue": "not found in source | contradicts source"}

21 ],

22 "confidence": 0.0 to 1.0

23}

25Be precise. Financial documents require exact accuracy."""

27async def check_hallucination(

28 source_text: str,

29 llm_output: str,

30 document_id: str,

31) -> dict:

32 response = client.chat.completions.create(

33 model="gpt-4o", # Use a capable model for verification

34 messages=[

35 {"role": "system", "content": HALLUCINATION_CHECK_PROMPT},

36 {"role": "user", "content": f"SOURCE:\n{source_text[:6000]}\n\nOUTPUT:\n{llm_output}"},

37 ],

38 temperature=0,

39 response_format={"type": "json_object"},

40 )

42 result = json.loads(response.choices[0].message.content)

44 if result.get("has_unsupported_claims") and result.get("confidence", 0) > 0.8:

45 # Flag for human review — don't automatically block (too many false positives)

46 await add_to_review_queue(document_id, result["unsupported_claims"])

48 return result

Phase 3: Optimization

By month 4, with 20,000 documents/day, we had three concrete performance problems:

Problem 1: AWS Comprehend PII detection was adding 150-300ms per call. At scale, this was significant.

Fix: Cache Comprehend results by a hash of the input text segment. Documents in financial services are often templated — the same boilerplate appears across thousands of documents. Cache hit rate reached 34% within a week.

python

1import hashlib

2import json

3import redis.asyncio as aioredis

5redis = aioredis.from_url("redis://...")

7async def cached_pii_check(text: str) -> tuple[str, bool]:

8 cache_key = f"pii:{hashlib.sha256(text.encode()).hexdigest()}"

9 cached = await redis.get(cache_key)

10 if cached:

11 data = json.loads(cached)

12 return data["redacted"], data["found"]

14 redacted, found = await _detect_and_redact_pii_uncached(text)

15 await redis.setex(cache_key, 3600, json.dumps({"redacted": redacted, "found": found}))

16 return redacted, found

Problem 2: Hallucination check via LLM was expensive — $0.04 per document when checking every output. At 40,000 documents/day, that's $1,600/day on quality checks alone.

Fix: Risk-tiered checking. High-risk document types (contracts, compliance filings) checked every output. Low-risk types (routine correspondence) sampled at 10%. Reduced hallucination check costs by 78% with no detectable change in quality incident rate.

Problem 3: The human review queue was backlogging. Reviewers couldn't keep up.

Fix: Added a confidence-based routing layer. Items with hallucination confidence > 0.95 were auto-rejected (re-processed). Items with confidence 0.7-0.95 went to human review. Items below 0.7 were auto-approved (classifier uncertainty, not actual hallucination).

Need a second opinion on your AI systems architecture?

I run free 30-minute strategy calls for engineering teams tackling this exact problem.

Book a Free Call

Results & Metrics

Performance Gains

After 6 months of iteration:

Synchronous guardrail latency: 82ms median (down from 340ms at launch, before caching)
End-to-end document processing: 38 seconds median, 71 seconds p95 (well within the 90s SLA)
Cross-tenant data leak incidents: 0 in production
Prompt injection attempts blocked: 847 over 6 months (0.003% of total requests — lower than expected)

The hallucination detection system flagged 2.3% of all document outputs for human review. Of those flagged, 61% were confirmed as having at least one unsupported claim. That's a meaningful catch rate for a financial services context.

Cost Impact

Component	Month 1 cost	Month 6 cost	Notes
AWS Comprehend PII	$12	$1,840	Volume-driven, caching helped
Hallucination checks (LLM)	$31	$1,620	Risk-tiered reduced from $6,200 est.
Redis cache	$0	$180	34% cache hit rate offset Comprehend costs
CloudWatch Logs (audit)	$8	$890	Regulatory requirement, non-negotiable

Total guardrail cost at 40,000 docs/day: ~$4,530/month, approximately 18% of total infrastructure costs. We considered this acceptable for the risk management function it provided.

Developer Productivity

Unexpected finding: the guardrail system accelerated development of new AI features, not slowed it. Because we had a standardized interface for adding guardrails, engineers could ship new LLM features with confidence that the safety layer handled the common risks. The pre-launch checklist went from "fill this out before the security team will approve" to "oh right, I should check these 6 things" — a sign that it had become part of the workflow rather than overhead.

Lessons Learned

What Worked

Risk tiering was the right call. Treating hallucination as "flag and review" rather than "block synchronously" was the decision that made the system viable. 100% synchronous blocking of uncertain outputs would have required the review queue to handle thousands of items per day — economically and operationally impossible for a 4-person team.

AWS Comprehend for PII was the right trade-off. We evaluated building our own PII classifier. At startup scale, Comprehend's accuracy (92% precision, 88% recall on our financial document corpus) was sufficient and saved 6 weeks of ML work. We would revisit this at 3x scale when the cost becomes a real line item.

Immutable audit logs paid dividends. One enterprise customer requested a compliance audit 3 months after contract signing. We pulled a complete, tamper-evident log of every LLM call for their document portfolio in 4 hours. That capability closed a deal with a second customer who was watching the audit closely.

What Surprised Us

Prompt injection attempts were rarer than expected. We anticipated this as the primary attack vector. In practice, the injection patterns we blocked came mostly from poorly formatted customer documents that happened to contain phrases like "ignore previous" in legal disclaimers — false positives, not attacks. We tuned the patterns down significantly after the first month.

Hallucination was correlated with document quality, not model behavior. Our hallucination detector flagged outputs on documents that were scanned PDFs with poor OCR quality. The model was doing its best with ambiguous source text. We added an input quality check (OCR confidence score) that caught these before LLM processing, reducing flagged hallucinations by 40%.

Cache invalidation for PII patterns was an edge case we missed. When we updated our PII detection patterns, stale cache entries using old patterns persisted for an hour. For a one-hour window, documents processed from cache used the outdated detection logic. We added cache versioning and a pattern change procedure.

Key Takeaways

Ship the synchronous blocking layer first. It's the only one that prevents incidents. Everything else is quality assurance.
Risk-tier your checks. Not all safety failures have the same consequence. Reserve synchronous blocking for the ones that would be immediately harmful.
Build the audit log before you need it. It will be asked for at the worst possible time (a customer audit, an incident investigation). The cost of building it retroactively under time pressure is high.
Measure false positive rates weekly. Guardrails that over-block are a product quality problem, not just a safety nuance. Every blocked legitimate document erodes customer trust.

What We'd Do Differently

Architecture Changes

Introduce a dedicated guardrail service earlier. We kept guardrail logic in the main processing service for the first 4 months. By the time we extracted it, we had enough interdependencies that the extraction took 2 weeks instead of 2 days. A shared Python package with a stable interface, deployed to a separate service, would have given us independent scaling from month 2.

Use a purpose-built LLM observation platform. We logged everything to CloudWatch, which worked fine for compliance but made debugging difficult. Platforms like Langfuse or Helicone would have given us cost breakdowns by document type, latency percentiles by guardrail, and per-tenant usage — all of which we ended up building manually in Grafana.

Build the confidence-based auto-routing from day 1. We built the human review queue and manually watched it grow for two months before we added automated routing. We should have designed the routing logic first and added human review as the fallback tier, not the primary tier.

Process Improvements

Establish a "guardrail regression" test suite before launch. When we changed PII patterns or injection detection heuristics, we had no automated way to check that existing safe inputs were still passing. We caught two regressions via customer complaints before we built regression tests. This should have been standard from the first deploy.

Create a shared "known-bad" document corpus for testing. We accumulated adversarial examples over 6 months organically (from real incidents and injection attempts). Having this corpus from month 1 — even with synthetic data — would have given us a more rigorous baseline for guardrail quality.

Conclusion

Building production-grade AI guardrails for regulated industries is fundamentally an exercise in risk stratification. The layered architecture — synchronous blocking for high-severity failures like cross-tenant leakage and PII exposure, async flagging for quality issues like hallucination and inconsistency — let us keep the critical path fast while still catching the long tail of quality problems. The 80-100ms synchronous layer was the key design constraint that made the system viable at 40,000 documents per day without requiring a dedicated guardrail microservice.

The most valuable takeaway is that guardrail systems are living infrastructure, not set-and-forget rules. Our false positive rates shifted significantly as document types changed and LLM model versions updated. Invest early in observability and feedback loops — immutable audit logs, dashboards tracking block rates by category, and a human review queue that feeds corrections back into your classifiers. Start with the simplest check that addresses your highest-risk failure mode, measure its real-world performance, and add complexity only when the data justifies it.

FAQ

Need expert help?

Building with agentic AI?

I help teams ship production-grade systems. From architecture review to hands-on builds.

Book a Free Call Send a Brief

ai-safety guardrails responsible-ai llm aws case-study

Muneer Puthiya Purayil

SaaS Architect & AI Systems Engineer. 10+ years shipping production infrastructure across fintech, automotive, e-commerce, and healthcare.

View Portfolio Book a Call

← Previous

AI Guardrails & Safety at Scale: Lessons from Production

The Challenge

Business Context

Technical Constraints

Scale Requirements

Architecture Decision

Options Evaluated

Decision Criteria

Final Architecture

Implementation

Phase 1: Foundation

Phase 2: Core Features

Phase 3: Optimization

Results & Metrics

Performance Gains

Cost Impact

Developer Productivity

Lessons Learned

What Worked

What Surprised Us

Key Takeaways

What We'd Do Differently

Architecture Changes

Process Improvements

Conclusion

FAQ

Building with agentic AI?

AI Guardrails & Safety Best Practices for Enterprise Teams

AI Guardrails & Safety Best Practices for Startup Teams

Complete Guide to AI Guardrails & Safety with Python

Complete Guide to Multi-Tenant Architecture with Typescript

AI Guardrails & Safety Best Practices for Enterprise Teams

Start a
Conversation.

The Challenge

Business Context

Technical Constraints

Scale Requirements

Architecture Decision

Options Evaluated

Decision Criteria

Final Architecture

Implementation

Phase 1: Foundation

Phase 2: Core Features

Phase 3: Optimization

Results & Metrics

Performance Gains

Cost Impact

Developer Productivity

Lessons Learned

What Worked

What Surprised Us

Key Takeaways

What We'd Do Differently

Architecture Changes

Process Improvements

Conclusion

FAQ

Building with agentic AI?

AI Guardrails & Safety Best Practices for Enterprise Teams

AI Guardrails & Safety Best Practices for Startup Teams

Complete Guide to AI Guardrails & Safety with Python

Complete Guide to Multi-Tenant Architecture with Typescript

AI Guardrails & Safety Best Practices for Enterprise Teams

Start aConversation.

Start a
Conversation.