Complete Guide to AI Guardrails & Safety with Typescript
A comprehensive guide to implementing AI Guardrails & Safety using Typescript, covering architecture, code examples, and production-ready patterns.
Muneer Puthiya Purayil 13 min read
Introduction
Why This Matters
LLMs deployed in production applications are probabilistic systems with no intrinsic safety boundary. Left unguarded, they hallucinate confidently, leak data from context windows, comply with adversarial prompt injections, and generate harmful outputs at scale. The financial and reputational consequences of these failures are asymmetric: a single prompt-injection incident that exfiltrates customer records costs 10–100× more to remediate than building layered guardrails from day one.
TypeScript is increasingly the implementation language of choice for guardrail systems in frontend-adjacent and full-stack teams. Its type system catches entire categories of runtime errors at compile time, its async/await model maps naturally to concurrent guardrail execution, and its ecosystem—Zod for schema validation, OpenAI SDK, Anthropic SDK—provides everything needed to build production-grade safety infrastructure.
Who This Is For
This guide targets TypeScript engineers who are moving LLM features from prototype to production: Node.js backend developers, Next.js full-stack engineers, and platform teams building shared AI infrastructure. You should be comfortable with Promise.all, generics, and discriminated unions. No ML background required.
What You Will Learn
The taxonomy of LLM safety failures and the guardrail class that addresses each
Type-safe guardrail abstractions using TypeScript discriminated unions and Zod schemas
Async-concurrent input and output guardrail pipelines
Prompt injection detection with pattern matching and LLM-as-judge
PII detection and redaction using the Presidio REST API from TypeScript
Performance profiling: keeping total guardrail overhead under 100ms p99
Testing with Vitest: unit, integration, and adversarial red-team suites
Core Concepts
Key Terminology
Guardrail: A programmatic check applied to LLM input or output that enforces a safety or content policy. Can be deterministic (regex, schema validation) or probabilistic (LLM classifier, embedding similarity).
Prompt injection: An attack where user-controlled text overrides system instructions. Direct injection: user crafts the attack. Indirect injection: a retrieved document (RAG, web fetch) contains adversarial instructions the LLM executes.
Jailbreak: Techniques to elicit policy-violating outputs—persona exploits ("DAN"), encoding tricks (Base64, leetspeak), hypothetical framings ("for a novel I'm writing...").
PII leakage: The model reproducing personally identifiable data from its training set or from context (RAG chunks, conversation history). Guardrails address this with NER-based detection and redaction.
Hallucination: Factually incorrect outputs generated with high fluency. Addressed by grounding checks and citation validation, not content policy filters.
Input guardrail: Applied before the prompt reaches the LLM. Must be fast (target <30ms). Blocks or sanitizes before incurring API costs.
Output guardrail: Applied to the LLM response before it reaches the user. More expensive but catches model-generated violations.
Mental Models
Model your guardrail system as a series of quality gates in a CI pipeline:
Lint check (regex/pattern): Instant, high-recall, catches obvious violations before they cost an API call.
Unit test (schema validation): Structural checks on input/output shape. Did the model return valid JSON? Are required fields present?
Integration test (semantic classifier): A secondary LLM call that evaluates the primary LLM's output against your content policy.
E2E test (human-in-the-loop): For high-stakes domains, route flagged responses to a review queue.
Each gate has a cost (latency, money). Run gates concurrently where possible. Short-circuit at the first block.
Foundational Principles
Type safety is a guardrail: Use Zod schemas to validate LLM JSON output. A schema mismatch is caught at the boundary, not deep in your business logic where the error is harder to diagnose.
Fail closed by default: When a guardrail errors or returns a confidence below threshold, block the response. Log the ambiguous case. Tune thresholds with real production data over time.
Concurrent by default: TypeScript's Promise.all is zero-cost for concurrent I/O. Always run independent guardrails in parallel; sequential execution multiplies latency unnecessarily.
Observability is a first-class citizen: Emit structured logs with guardrail name, decision, confidence, latency, and a truncated content hash. You cannot tune what you cannot measure, and you need an audit trail for SOC 2 and EU AI Act compliance.
Architecture Overview
High-Level Design
1User Request (string)
2 │
3 ▼
4┌──────────────────────────┐
5│ Input Guardrail Layer │
6│ • Regex patterncheck │
7│ • PII detection/redact │
8│ • Injection classifier │
9└───────────┬──────────────┘
10 │ PASS (sanitized input)
11 ▼
12┌──────────────────────────┐
13│ LLM API Call │
14│ (OpenAI / Anthropic) │
15└───────────┬──────────────┘
16 │ raw response
17 ▼
18┌──────────────────────────┐
19│ Output Guardrail Layer │
20│ • Toxicity classifier │
21│ • Schema validation │
22│ • PII scrubbing │
23└───────────┬──────────────┘
24 │ PASS (safe response)
25 ▼
26User Response
27
Component Breakdown
GuardrailPipeline: The orchestrator. Runs input guardrails concurrently via Promise.all, calls the LLM, then runs output guardrails concurrently. Returns GuardrailResult (pass) or BlockResult (blocked with a safe refusal message).
BaseGuardrail: Abstract class with a single check(ctx: GuardrailContext): Promise<GuardrailDecision> method. Guardrails are stateless singletons.
GuardrailContext: Immutable value object carrying the message, conversation history, system prompt, user metadata (tier, auth status), and optionally the LLM response (for output guardrails).
PolicyRouter: Selects the guardrail profile based on request metadata. Anonymous public users get the full stack; internal admin users on an authenticated endpoint get a lightweight profile.
The primary latency drivers in a TypeScript guardrail stack:
Guardrail
p50
p99
Notes
Regex pattern matching
<1ms
<1ms
Zero I/O, synchronous
Presidio PII (REST)
8ms
25ms
Sidecar HTTP roundtrip
Semantic injection (gpt-4o-mini)
120ms
280ms
Network + LLM inference
Zod schema validation
<1ms
<1ms
Pure CPU
Run all input guardrails concurrently with Promise.allSettled. Three guardrails with p50 latencies of 1ms, 10ms, and 120ms resolve in ~120ms total, not 131ms sequentially.
Short-circuit: pattern matching runs synchronously before any async guardrail. If it blocks, you've spent <1ms and avoided a 120ms LLM call.
Memory Management
TypeScript Node.js processes don't carry ML model weights in-process (unlike the Python detoxify approach). Keep memory overhead low by:
Reuse HTTP clients: Instantiate OpenAI and fetch clients once at module load, not per-request.
Bound your cache: Enforce MAX_CACHE_SIZE on the in-process decision cache to prevent unbounded growth.
Stream large responses: For long LLM outputs, run the output guardrail on chunks rather than buffering the full response before checking.
For multi-process deployments (PM2 cluster, multiple pods), replace the in-memory cache with a shared Redis cache using ioredis:
4 --body '{"message":"What is the capital of France?"}' \
5 http://localhost:3000/chat
6
Target: p99 total request latency < 500ms under 200 RPS with 50 concurrent connections. If the semantic injection guardrail is your bottleneck, introduce an async queue that processes LLM classification out-of-band for non-interactive contexts (batch jobs, webhooks).
18const result = await guardrail.check({ message: sample });
19expect(result.type).toBe('block');
20 });
21 }
22});
23
Add this to your CI pipeline (vitest run --coverage) and set a coverage threshold:
json
1// vitest.config.ts
2{
3"coverage":{
4"thresholds":{"lines":80,"functions":85}
5}
6}
7
Conclusion
TypeScript's type system is a natural fit for guardrail infrastructure. Discriminated unions model the pass/block/sanitize decision cleanly, Zod schemas validate LLM JSON output at the boundary, and Promise.all provides zero-cost concurrent execution of independent checks. The result is a guardrail pipeline where type errors are caught at compile time, not in production when a toxicity classifier returns an unexpected shape.
The practical path forward: implement the BaseGuardrail abstract class and GuardrailPipeline orchestrator first, then build out individual guardrails in priority order — prompt injection detection and PII redaction address the highest-liability failure modes and should ship before toxicity or hallucination checks. Use the PolicyRouter to apply different guardrail profiles based on request context, run comprehensive Vitest suites including adversarial inputs, and emit structured logs for every guardrail decision. The observability data you collect in the first month of production will inform every tuning decision that follows.
FAQ
Need expert help?
Building with agentic AI?
I help teams ship production-grade systems. From architecture review to hands-on builds.
For teams building at scale: SaaS platforms, agentic AI systems, and enterprise mobile infrastructure. Scope and fit are evaluated before any engagement begins.