Prerequisites
Required Knowledge
This tutorial targets engineers who know TypeScript and have built REST APIs before. NestJS is a structured, opinionated framework — understanding its module/provider/controller triad is essential before layering agentic patterns on top. You should be comfortable with:
- TypeScript 5.x — decorators, generics,
async/await, type narrowing - NestJS fundamentals — modules, providers, dependency injection, guards, interceptors
- OpenAI tool calling — how the model requests function invocations and processes results
- Redis — basic key-value operations; we use it for workflow state persistence
- Bull/BullMQ — NestJS's standard queue library for background job processing
Development Environment
Node.js 20 LTS or later. NestJS applications are CPU-bound during TypeScript compilation but IO-bound at runtime — Node 20's improved async performance matters at scale.
Redis 7+ running locally:
Install the NestJS CLI globally:
Set required environment variables:
Dependencies
The core packages we'll install:
Project Setup
Initialize Project
Choose npm when prompted. The CLI generates a standard NestJS structure. We'll extend it:
Configure Build Tools
NestJS uses SWC for fast compilation in development. Enable it in nest-cli.json:
Strict TypeScript settings catch agentic bugs at compile time. Update tsconfig.json:
Add Dependencies
Configure the app module to wire everything together:
Core Implementation
Building the Foundation
The state store handles workflow persistence in Redis. Each workflow run is a JSON-serialized entity with a typed status enum:
Adding Business Logic
The AgentService implements the tool-calling loop. It's framework-agnostic — it takes an OpenAI client, a system prompt, and a tool registry, and iterates until the model finishes or the iteration limit is hit:
Connecting Services
The WorkflowProcessor is a BullMQ job processor — it runs in the background and calls the agent service. This decouples HTTP request handling from the LLM work, which can take 30-120 seconds:
Adding Features
Feature 1: Core Capability — HTTP API
The controller provides the public API surface. Workflow creation is non-blocking — it enqueues a job and returns the ID immediately:
Feature 2: Extensions — Multi-Agent Orchestration
For complex tasks, decompose work across specialized agents running in parallel using Promise.allSettled:
Feature 3: Polish — Validation and Guards
Use class-validator DTOs to fail fast on bad input before any LLM calls are made:
Need a second opinion on your AI systems architecture?
I run free 30-minute strategy calls for engineering teams tackling this exact problem.
Book a Free CallError Handling
Error Classification
NestJS exception filters let you handle agent-specific errors uniformly across all controllers:
Recovery Strategies
BullMQ handles job retries with exponential backoff — configure it per job type:
User-Facing Messages
Register the exception filter globally in main.ts:
For workflow-level errors visible via the status endpoint, normalize error messages before storing them:
Deployment
Environment Configuration
Dockerfile:
CI/CD Pipeline
Monitoring Setup
NestJS works well with @willsoto/nestjs-prometheus for metrics exposure:
Key metrics to expose at /metrics:
agent_iterations_total— counter by agent typellm_request_duration_seconds— histogram by modelworkflow_status_total— counter by type and statustool_call_errors_total— counter by tool and error type
Next Steps
Recommended Extensions
NestJS interceptors for request logging. Add a global LoggingInterceptor that logs every workflow request with correlation IDs. When a user reports an issue, you can trace the exact LLM calls that happened.
Configurable tool registry. Instead of hardcoding tools per processor, inject a ToolRegistry service that maps tool names to handlers. New tools register themselves via @Injectable() — no processor changes needed.
Workflow replay. Store the full message history alongside workflow state. If a workflow fails, replay from the last successful checkpoint rather than starting over — critical for 10+ step workflows.
Budget enforcement. Track OpenAI token usage per workflow using response.usage. Abort with AgentError('budget_exceeded', false) when a workflow exceeds its token budget.
Dead letter queue. Configure BullMQ's defaultJobOptions.removeOnFail: false and a separate queue consumer for failed jobs. Alert engineering when the DLQ depth exceeds a threshold.
Further Reading
- NestJS Queues documentation — official BullMQ integration guide
- BullMQ documentation — job queues, flow producers, rate limiters
- OpenAI Assistants API — managed thread and tool state; complements rather than replaces custom orchestration
- LangGraph.js — graph-based agent orchestration with typed state channels
Conclusion
NestJS provides the structural guardrails that agentic AI systems need at scale: dependency injection for clean separation between LLM clients, tool implementations, and workflow orchestration; BullMQ integration for queue-backed async execution; and the module system for organizing agent logic into independently testable units. The decorator-driven architecture means concerns like logging, validation, and error handling wrap agent workflows without polluting business logic.
The implementation path covered here — typed DTOs for workflow state, a tool registry pattern for extensible tool management, BullMQ processors for long-running workflows, and event emitters for step-level observability — gives you a production-ready foundation without over-engineering. Use class-validator on all incoming requests, Zod on all LLM outputs, and Redis-backed state for crash recovery. Run your workflow processor as a separate NestJS application from your API server so agent load does not compete with request handling. The patterns here scale from a single workflow to a multi-agent platform because NestJS's module system was designed for exactly this kind of incremental complexity growth.