Back to Journal
AI Architecture

How to Build Agentic AI Workflows Using Nestjs

Step-by-step tutorial for building Agentic AI Workflows with Nestjs, from project setup through deployment.

Muneer Puthiya Purayil 17 min read

Prerequisites

Required Knowledge

This tutorial targets engineers who know TypeScript and have built REST APIs before. NestJS is a structured, opinionated framework — understanding its module/provider/controller triad is essential before layering agentic patterns on top. You should be comfortable with:

  • TypeScript 5.x — decorators, generics, async/await, type narrowing
  • NestJS fundamentals — modules, providers, dependency injection, guards, interceptors
  • OpenAI tool calling — how the model requests function invocations and processes results
  • Redis — basic key-value operations; we use it for workflow state persistence
  • Bull/BullMQ — NestJS's standard queue library for background job processing

Development Environment

Node.js 20 LTS or later. NestJS applications are CPU-bound during TypeScript compilation but IO-bound at runtime — Node 20's improved async performance matters at scale.

bash
node --version # v20.x.x or later npm --version # 10.x

Redis 7+ running locally:

bash
docker run -d -p 6379:6379 redis:7-alpine

Install the NestJS CLI globally:

bash
npm install -g @nestjs/cli

Set required environment variables:

bash
1export OPENAI_API_KEY="sk-..."
2export REDIS_HOST="localhost"
3export REDIS_PORT="6379"
4 

Dependencies

The core packages we'll install:

1@nestjs/core ^10.4 # Framework core
2@nestjs/common ^10.4 # Decorators, guards, pipes
3@nestjs/bull ^10.2 # BullMQ integration for job queues
4@nestjs/config ^3.2 # Environment variable management
5openai ^1.40 # OpenAI SDK with tool calling
6bull ^4.16 # Underlying queue library
7ioredis ^5.4 # Redis client
8class-validator ^0.14 # Request validation
9class-transformer ^0.5 # DTO transformation
10@nestjs/event-emitter ^2.0 # Internal event bus for workflow steps
11ulid ^2.3 # Sortable IDs for workflow runs
12 

Project Setup

Initialize Project

bash
nest new agentic-api cd agentic-api

Choose npm when prompted. The CLI generates a standard NestJS structure. We'll extend it:

1src/
2├── app.module.ts # Root module — imports all feature modules
3├── main.ts # Bootstrap with validation pipe
4├── config/
5 └── configuration.ts # Typed config factory
6├── agent/
7 ├── agent.module.ts
8 ├── agent.service.ts # Core LLM invocation with tool loop
9 └── tools/
10 ├── tool.registry.ts
11 └── web-search.tool.ts
12├── workflow/
13 ├── workflow.module.ts
14 ├── workflow.service.ts # State management + orchestration
15 ├── workflow.controller.ts # HTTP endpoints
16 ├── workflow.processor.ts # BullMQ job processor
17 ├── dto/
18 ├── create-workflow.dto.ts
19 └── workflow-response.dto.ts
20 └── entities/
21 └── workflow.entity.ts
22└── state/
23 ├── state.module.ts
24 └── state.store.ts # Redis-backed state persistence
25 

Configure Build Tools

NestJS uses SWC for fast compilation in development. Enable it in nest-cli.json:

json
1{
2 "$schema": "https://json.schemastore.org/nest-cli",
3 "collection": "@nestjs/schematics",
4 "sourceRoot": "src",
5 "compilerOptions": {
6 "builder": {
7 "type": "swc",
8 "options": {
9 "swcOptions": {
10 "sourceMaps": true
11 }
12 }
13 },
14 "deleteOutDir": true
15 }
16}
17 

Strict TypeScript settings catch agentic bugs at compile time. Update tsconfig.json:

json
1{
2 "compilerOptions": {
3 "strict": true,
4 "strictNullChecks": true,
5 "noUncheckedIndexedAccess": true,
6 "target": "ES2022",
7 "experimentalDecorators": true,
8 "emitDecoratorMetadata": true
9 }
10}
11 

Add Dependencies

bash
1npm install @nestjs/bull @nestjs/config @nestjs/event-emitter \
2 openai bull ioredis class-validator class-transformer ulid
3 
4npm install -D @types/bull
5 

Configure the app module to wire everything together:

typescript
1// src/app.module.ts
2import { Module } from '@nestjs/common';
3import { ConfigModule, ConfigService } from '@nestjs/config';
4import { BullModule } from '@nestjs/bull';
5import { EventEmitterModule } from '@nestjs/event-emitter';
6import configuration from './config/configuration';
7import { AgentModule } from './agent/agent.module';
8import { WorkflowModule } from './workflow/workflow.module';
9import { StateModule } from './state/state.module';
10 
11@Module({
12 imports: [
13 ConfigModule.forRoot({
14 isGlobal: true,
15 load: [configuration],
16 }),
17 BullModule.forRootAsync({
18 imports: [ConfigModule],
19 useFactory: (config: ConfigService) => ({
20 redis: {
21 host: config.get<string>('redis.host'),
22 port: config.get<number>('redis.port'),
23 },
24 }),
25 inject: [ConfigService],
26 }),
27 EventEmitterModule.forRoot(),
28 StateModule,
29 AgentModule,
30 WorkflowModule,
31 ],
32})
33export class AppModule {}
34 

Core Implementation

Building the Foundation

The state store handles workflow persistence in Redis. Each workflow run is a JSON-serialized entity with a typed status enum:

typescript
1// src/workflow/entities/workflow.entity.ts
2export enum WorkflowStatus {
3 PENDING = 'pending',
4 RUNNING = 'running',
5 AWAITING_INPUT = 'awaiting_input',
6 COMPLETED = 'completed',
7 FAILED = 'failed',
8}
9 
10export interface WorkflowStep {
11 type: string;
12 timestamp: string;
13 data: Record<string, unknown>;
14}
15 
16export class WorkflowEntity {
17 workflowId: string;
18 workflowType: string;
19 inputData: Record<string, unknown>;
20 status: WorkflowStatus;
21 steps: WorkflowStep[];
22 output: Record<string, unknown> | null;
23 error: string | null;
24 createdAt: string;
25 updatedAt: string;
26 
27 constructor(params: {
28 workflowId: string;
29 workflowType: string;
30 inputData: Record<string, unknown>;
31 }) {
32 Object.assign(this, params);
33 this.status = WorkflowStatus.PENDING;
34 this.steps = [];
35 this.output = null;
36 this.error = null;
37 const now = new Date().toISOString();
38 this.createdAt = now;
39 this.updatedAt = now;
40 }
41}
42 
typescript
1// src/state/state.store.ts
2import { Injectable, Logger } from '@nestjs/common';
3import { ConfigService } from '@nestjs/config';
4import { InjectRedis } from '@nestjs/bull'; // Bull provides Redis access
5import Redis from 'ioredis';
6import { monotonicFactory } from 'ulid';
7import { WorkflowEntity, WorkflowStatus } from '../workflow/entities/workflow.entity';
8 
9const ulid = monotonicFactory();
10 
11@Injectable()
12export class StateStore {
13 private readonly logger = new Logger(StateStore.name);
14 private readonly ttl: number;
15 
16 constructor(
17 @InjectRedis() private readonly redis: Redis,
18 private readonly config: ConfigService,
19 ) {
20 this.ttl = config.get<number>('workflow.ttlSeconds', 86400);
21 }
22 
23 private key(workflowId: string): string {
24 return `workflow:${workflowId}`;
25 }
26 
27 async create(
28 workflowType: string,
29 inputData: Record<string, unknown>,
30 ): Promise<WorkflowEntity> {
31 const workflowId = ulid();
32 const entity = new WorkflowEntity({ workflowId, workflowType, inputData });
33 await this.save(entity);
34 return entity;
35 }
36 
37 async save(entity: WorkflowEntity): Promise<void> {
38 entity.updatedAt = new Date().toISOString();
39 await this.redis.setex(this.key(entity.workflowId), this.ttl, JSON.stringify(entity));
40 }
41 
42 async get(workflowId: string): Promise<WorkflowEntity | null> {
43 const raw = await this.redis.get(this.key(workflowId));
44 if (!raw) return null;
45 return JSON.parse(raw) as WorkflowEntity;
46 }
47 
48 async appendStep(workflowId: string, step: WorkflowEntity['steps'][number]): Promise<void> {
49 const entity = await this.get(workflowId);
50 if (!entity) {
51 this.logger.warn(`appendStep: workflow ${workflowId} not found`);
52 return;
53 }
54 entity.steps.push({ ...step, timestamp: new Date().toISOString() });
55 await this.save(entity);
56 }
57}
58 

Adding Business Logic

The AgentService implements the tool-calling loop. It's framework-agnostic — it takes an OpenAI client, a system prompt, and a tool registry, and iterates until the model finishes or the iteration limit is hit:

typescript
1// src/agent/agent.service.ts
2import { Injectable, Logger } from '@nestjs/common';
3import { ConfigService } from '@nestjs/config';
4import OpenAI from 'openai';
5import type { ChatCompletionMessageParam } from 'openai/resources/chat';
6 
7export interface ToolDefinition {
8 schema: OpenAI.Chat.ChatCompletionTool;
9 handler: (args: Record<string, unknown>) => Promise<unknown>;
10}
11 
12export interface AgentRunResult {
13 content: string | null;
14 toolCallLog: Array<{
15 tool: string;
16 args: Record<string, unknown>;
17 result: unknown;
18 timestamp: string;
19 }>;
20 iterations: number;
21 finishReason: string;
22}
23 
24@Injectable()
25export class AgentService {
26 private readonly logger = new Logger(AgentService.name);
27 private readonly client: OpenAI;
28 private readonly model: string;
29 private readonly maxIterations: number;
30 
31 constructor(private readonly config: ConfigService) {
32 this.client = new OpenAI({
33 apiKey: config.getOrThrow<string>('openai.apiKey'),
34 });
35 this.model = config.get<string>('openai.model', 'gpt-4o');
36 this.maxIterations = config.get<number>('agent.maxIterations', 10);
37 }
38 
39 async run(params: {
40 systemPrompt: string;
41 userMessage: string;
42 tools: ToolDefinition[];
43 context?: Record<string, unknown>;
44 maxIterations?: number;
45 }): Promise<AgentRunResult> {
46 const { systemPrompt, userMessage, tools, context, maxIterations } = params;
47 const limit = maxIterations ?? this.maxIterations;
48 const toolSchemas = tools.map((t) => t.schema);
49 const toolMap = new Map(tools.map((t) => [t.schema.function.name, t.handler]));
50 
51 const messages: ChatCompletionMessageParam[] = [
52 { role: 'system', content: systemPrompt },
53 ];
54 
55 if (context && Object.keys(context).length > 0) {
56 messages.push({
57 role: 'system',
58 content: `Context:\n${JSON.stringify(context, null, 2)}`,
59 });
60 }
61 
62 messages.push({ role: 'user', content: userMessage });
63 
64 const toolCallLog: AgentRunResult['toolCallLog'] = [];
65 let iterations = 0;
66 
67 while (iterations < limit) {
68 iterations++;
69 
70 const response = await this.client.chat.completions.create({
71 model: this.model,
72 messages,
73 tools: toolSchemas.length > 0 ? toolSchemas : undefined,
74 tool_choice: toolSchemas.length > 0 ? 'auto' : undefined,
75 temperature: 0.1,
76 });
77 
78 const message = response.choices[0].message;
79 messages.push(message as ChatCompletionMessageParam);
80 
81 if (!message.tool_calls || message.tool_calls.length === 0) {
82 return {
83 content: message.content,
84 toolCallLog,
85 iterations,
86 finishReason: response.choices[0].finish_reason,
87 };
88 }
89 
90 for (const toolCall of message.tool_calls) {
91 const fnName = toolCall.function.name;
92 const fnArgs = JSON.parse(toolCall.function.arguments) as Record<string, unknown>;
93 const handler = toolMap.get(fnName);
94 
95 let result: unknown;
96 if (!handler) {
97 result = { error: `Unknown tool: ${fnName}` };
98 } else {
99 try {
100 result = await handler(fnArgs);
101 } catch (err) {
102 result = { error: String(err), tool: fnName };
103 }
104 }
105 
106 this.logger.debug(`Tool call: ${fnName}`, { args: fnArgs, result });
107 toolCallLog.push({
108 tool: fnName,
109 args: fnArgs,
110 result,
111 timestamp: new Date().toISOString(),
112 });
113 
114 messages.push({
115 role: 'tool',
116 tool_call_id: toolCall.id,
117 content: JSON.stringify(result),
118 });
119 }
120 }
121 
122 return {
123 content: 'Maximum iterations reached.',
124 toolCallLog,
125 iterations,
126 finishReason: 'max_iterations',
127 };
128 }
129}
130 

Connecting Services

The WorkflowProcessor is a BullMQ job processor — it runs in the background and calls the agent service. This decouples HTTP request handling from the LLM work, which can take 30-120 seconds:

typescript
1// src/workflow/workflow.processor.ts
2import { Process, Processor } from '@nestjs/bull';
3import { Logger } from '@nestjs/common';
4import { EventEmitter2 } from '@nestjs/event-emitter';
5import { Job } from 'bull';
6import { AgentService, ToolDefinition } from '../agent/agent.service';
7import { StateStore } from '../state/state.store';
8import { WorkflowStatus } from './entities/workflow.entity';
9 
10export interface WorkflowJobData {
11 workflowId: string;
12 workflowType: string;
13 inputData: Record<string, unknown>;
14}
15 
16const RESEARCH_SYSTEM_PROMPT = `You are a rigorous research agent. Your job is to:
171. Search for authoritative information on the given topic
182. Synthesize findings into a structured, factual report
193. Cite your sources
20 
21Do not speculate beyond what sources confirm. Finish when you have a comprehensive answer.`;
22 
23@Processor('workflows')
24export class WorkflowProcessor {
25 private readonly logger = new Logger(WorkflowProcessor.name);
26 
27 constructor(
28 private readonly agentService: AgentService,
29 private readonly stateStore: StateStore,
30 private readonly eventEmitter: EventEmitter2,
31 ) {}
32 
33 @Process('research')
34 async handleResearch(job: Job<WorkflowJobData>): Promise<void> {
35 const { workflowId, inputData } = job.data;
36 const topic = inputData.topic as string;
37 
38 this.logger.log(`Starting research workflow ${workflowId} for: ${topic}`);
39 
40 const state = await this.stateStore.get(workflowId);
41 if (!state) throw new Error(`Workflow ${workflowId} not found in state store`);
42 
43 state.status = WorkflowStatus.RUNNING;
44 await this.stateStore.save(state);
45 
46 // Tool definitions — in production, inject a ToolRegistry
47 const tools: ToolDefinition[] = [
48 {
49 schema: {
50 type: 'function',
51 function: {
52 name: 'search_web',
53 description: 'Search the web for information on a topic',
54 parameters: {
55 type: 'object',
56 properties: {
57 query: { type: 'string', description: 'Search query' },
58 },
59 required: ['query'],
60 },
61 },
62 },
63 handler: async (args) => {
64 // Integrate with Brave, Exa, or SerpAPI here
65 return { results: [`Stub results for: ${args.query}`] };
66 },
67 },
68 ];
69 
70 try {
71 const result = await this.agentService.run({
72 systemPrompt: RESEARCH_SYSTEM_PROMPT,
73 userMessage: `Research this topic thoroughly: ${topic}`,
74 tools,
75 maxIterations: 8,
76 });
77 
78 const freshState = await this.stateStore.get(workflowId);
79 if (!freshState) return;
80 
81 freshState.status = WorkflowStatus.COMPLETED;
82 freshState.output = {
83 report: result.content,
84 toolCallCount: result.toolCallLog.length,
85 iterations: result.iterations,
86 };
87 await this.stateStore.save(freshState);
88 
89 this.eventEmitter.emit('workflow.completed', { workflowId });
90 this.logger.log(`Workflow ${workflowId} completed in ${result.iterations} iterations`);
91 } catch (error) {
92 const failedState = await this.stateStore.get(workflowId);
93 if (failedState) {
94 failedState.status = WorkflowStatus.FAILED;
95 failedState.error = String(error);
96 await this.stateStore.save(failedState);
97 }
98 this.eventEmitter.emit('workflow.failed', { workflowId, error: String(error) });
99 throw error; // Re-throw so Bull marks the job as failed
100 }
101 }
102}
103 

Adding Features

Feature 1: Core Capability — HTTP API

The controller provides the public API surface. Workflow creation is non-blocking — it enqueues a job and returns the ID immediately:

typescript
1// src/workflow/workflow.controller.ts
2import {
3 Body, Controller, Get, Param, Post, Sse, MessageEvent,
4 NotFoundException, Logger,
5} from '@nestjs/common';
6import { WorkflowService } from './workflow.service';
7import { CreateWorkflowDto } from './dto/create-workflow.dto';
8import { Observable, interval, switchMap, takeWhile, map } from 'rxjs';
9import { WorkflowStatus } from './entities/workflow.entity';
10 
11@Controller('api/v1/workflows')
12export class WorkflowController {
13 private readonly logger = new Logger(WorkflowController.name);
14 
15 constructor(private readonly workflowService: WorkflowService) {}
16 
17 @Post('research')
18 async createResearch(@Body() dto: CreateWorkflowDto) {
19 const workflowId = await this.workflowService.startResearch(dto.topic, dto.depth);
20 return { workflowId, status: 'pending' };
21 }
22 
23 @Get(':id')
24 async getWorkflow(@Param('id') id: string) {
25 const workflow = await this.workflowService.getWorkflow(id);
26 if (!workflow) throw new NotFoundException(`Workflow ${id} not found`);
27 return workflow;
28 }
29 
30 @Sse(':id/stream')
31 streamWorkflow(@Param('id') id: string): Observable<MessageEvent> {
32 return interval(1000).pipe(
33 switchMap(async () => {
34 const state = await this.workflowService.getWorkflow(id);
35 if (!state) throw new NotFoundException();
36 return state;
37 }),
38 takeWhile(
39 (state) =>
40 state.status !== WorkflowStatus.COMPLETED &&
41 state.status !== WorkflowStatus.FAILED,
42 true, // emit the terminal state before completing
43 ),
44 map((state) => ({
45 data: {
46 workflowId: state.workflowId,
47 status: state.status,
48 stepsCount: state.steps.length,
49 output: state.output,
50 error: state.error,
51 },
52 })),
53 );
54 }
55}
56 

Feature 2: Extensions — Multi-Agent Orchestration

For complex tasks, decompose work across specialized agents running in parallel using Promise.allSettled:

typescript
1// src/workflow/workflow.service.ts (orchestration method)
2async startOrchestration(goal: string): Promise<string> {
3 const state = await this.stateStore.create('orchestrator', { goal });
4 const workflowId = state.workflowId;
5 
6 // Run async — don't await
7 this.runOrchestration(workflowId, goal).catch((err) => {
8 this.logger.error(`Orchestration ${workflowId} failed`, err);
9 });
10 
11 return workflowId;
12}
13 
14private async runOrchestration(workflowId: string, goal: string): Promise<void> {
15 // Step 1: Plan
16 const planResult = await this.agentService.run({
17 systemPrompt: 'You are a task planner. Break the goal into 2-4 parallel research subtasks. Respond with a JSON array: [{"id": "1", "task": "..."}]',
18 userMessage: `Plan subtasks for: ${goal}`,
19 tools: [],
20 });
21 
22 const jsonMatch = planResult.content?.match(/\[[\s\S]*\]/);
23 const subtasks: Array<{ id: string; task: string }> = jsonMatch
24 ? JSON.parse(jsonMatch[0])
25 : [{ id: '1', task: goal }];
26 
27 // Step 2: Execute subtasks in parallel
28 const results = await Promise.allSettled(
29 subtasks.map(async (subtask) => {
30 const result = await this.agentService.run({
31 systemPrompt: 'You are a research specialist. Gather detailed information.',
32 userMessage: subtask.task,
33 tools: this.getResearchTools(),
34 });
35 await this.stateStore.appendStep(workflowId, {
36 type: 'subtask_complete',
37 data: { subtaskId: subtask.id, iterations: result.iterations },
38 timestamp: new Date().toISOString(),
39 });
40 return { id: subtask.id, content: result.content };
41 }),
42 );
43 
44 // Step 3: Synthesize
45 const subtaskOutputs = results
46 .filter((r): r is PromiseFulfilledResult<{ id: string; content: string | null }> => r.status === 'fulfilled')
47 .map((r) => `Subtask ${r.value.id}:\n${r.value.content}`)
48 .join('\n\n---\n\n');
49 
50 const synthesis = await this.agentService.run({
51 systemPrompt: 'Synthesize research into a final comprehensive response.',
52 userMessage: `Goal: ${goal}\n\nResearch outputs:\n${subtaskOutputs}`,
53 tools: [],
54 });
55 
56 const finalState = await this.stateStore.get(workflowId);
57 if (!finalState) return;
58 finalState.status = WorkflowStatus.COMPLETED;
59 finalState.output = { result: synthesis.content, subtaskCount: subtasks.length };
60 await this.stateStore.save(finalState);
61}
62 
63private getResearchTools(): ToolDefinition[] {
64 return [/* inject from ToolRegistry */];
65}
66 

Feature 3: Polish — Validation and Guards

Use class-validator DTOs to fail fast on bad input before any LLM calls are made:

typescript
1// src/workflow/dto/create-workflow.dto.ts
2import { IsString, IsNotEmpty, IsIn, MaxLength, MinLength } from 'class-validator';
3 
4export class CreateWorkflowDto {
5 @IsString()
6 @IsNotEmpty()
7 @MinLength(3)
8 @MaxLength(500)
9 topic: string;
10 
11 @IsIn(['standard', 'deep'])
12 depth: 'standard' | 'deep' = 'standard';
13}
14 
typescript
1// src/main.ts
2import { NestFactory } from '@nestjs/core';
3import { ValidationPipe, Logger } from '@nestjs/common';
4import { AppModule } from './app.module';
5 
6async function bootstrap() {
7 const app = await NestFactory.create(AppModule);
8 
9 app.useGlobalPipes(
10 new ValidationPipe({
11 whitelist: true, // Strip unknown properties
12 forbidNonWhitelisted: true,
13 transform: true, // Auto-transform to DTO types
14 transformOptions: { enableImplicitConversion: true },
15 }),
16 );
17 
18 const port = process.env.PORT ?? 3000;
19 await app.listen(port);
20 Logger.log(`Agentic API running on port ${port}`);
21}
22bootstrap();
23 

Need a second opinion on your AI systems architecture?

I run free 30-minute strategy calls for engineering teams tackling this exact problem.

Book a Free Call

Error Handling

Error Classification

NestJS exception filters let you handle agent-specific errors uniformly across all controllers:

typescript
1// src/agent/agent.exception.ts
2export class AgentError extends Error {
3 constructor(
4 message: string,
5 public readonly errorType: 'rate_limit' | 'timeout' | 'loop_detected' | 'max_iterations' | 'fatal',
6 public readonly retryable: boolean = false,
7 ) {
8 super(message);
9 this.name = 'AgentError';
10 }
11}
12 
typescript
1// src/agent/agent-exception.filter.ts
2import { ExceptionFilter, Catch, ArgumentsHost, Logger } from '@nestjs/common';
3import { Response } from 'express';
4import { AgentError } from './agent.exception';
5 
6const ERROR_MESSAGES: Record<string, string> = {
7 rate_limit: 'The AI service is temporarily busy. Please retry in 30 seconds.',
8 timeout: 'The agent timed out. Try a simpler request.',
9 loop_detected: 'The agent got stuck. Try rephrasing your request.',
10 max_iterations: 'Task too complex for automatic completion. Break it into smaller steps.',
11 fatal: 'An unexpected error occurred.',
12};
13 
14@Catch(AgentError)
15export class AgentExceptionFilter implements ExceptionFilter {
16 private readonly logger = new Logger(AgentExceptionFilter.name);
17 
18 catch(exception: AgentError, host: ArgumentsHost) {
19 const ctx = host.switchToHttp();
20 const response = ctx.getResponse<Response>();
21 
22 this.logger.error(`AgentError [${exception.errorType}]: ${exception.message}`);
23 
24 response.status(exception.retryable ? 503 : 422).json({
25 error: exception.errorType,
26 message: ERROR_MESSAGES[exception.errorType] ?? ERROR_MESSAGES.fatal,
27 retryable: exception.retryable,
28 });
29 }
30}
31 

Recovery Strategies

BullMQ handles job retries with exponential backoff — configure it per job type:

typescript
1// src/workflow/workflow.service.ts
2import { InjectQueue } from '@nestjs/bull';
3import { Queue } from 'bull';
4 
5@Injectable()
6export class WorkflowService {
7 constructor(
8 @InjectQueue('workflows') private readonly workflowQueue: Queue,
9 private readonly stateStore: StateStore,
10 ) {}
11 
12 async startResearch(topic: string, depth = 'standard'): Promise<string> {
13 const state = await this.stateStore.create('research', { topic, depth });
14 
15 await this.workflowQueue.add(
16 'research',
17 {
18 workflowId: state.workflowId,
19 workflowType: 'research',
20 inputData: { topic, depth },
21 },
22 {
23 attempts: 3,
24 backoff: { type: 'exponential', delay: 5000 },
25 removeOnComplete: 100, // Keep last 100 completed jobs
26 removeOnFail: 50,
27 },
28 );
29 
30 return state.workflowId;
31 }
32}
33 

User-Facing Messages

Register the exception filter globally in main.ts:

typescript
1import { AgentExceptionFilter } from './agent/agent-exception.filter';
2 
3// In bootstrap():
4app.useGlobalFilters(new AgentExceptionFilter());
5 

For workflow-level errors visible via the status endpoint, normalize error messages before storing them:

typescript
1// In WorkflowProcessor
2function normalizeError(err: unknown): string {
3 if (err instanceof AgentError) {
4 return ERROR_MESSAGES[err.errorType] ?? 'An error occurred.';
5 }
6 if (err instanceof Error && err.message.includes('rate limit')) {
7 return ERROR_MESSAGES.rate_limit;
8 }
9 return ERROR_MESSAGES.fatal;
10}
11 

Deployment

Environment Configuration

typescript
1// src/config/configuration.ts
2export default () => ({
3 openai: {
4 apiKey: process.env.OPENAI_API_KEY,
5 model: process.env.OPENAI_MODEL ?? 'gpt-4o',
6 },
7 redis: {
8 host: process.env.REDIS_HOST ?? 'localhost',
9 port: parseInt(process.env.REDIS_PORT ?? '6379', 10),
10 },
11 agent: {
12 maxIterations: parseInt(process.env.MAX_AGENT_ITERATIONS ?? '10', 10),
13 timeoutSeconds: parseInt(process.env.AGENT_TIMEOUT_SECONDS ?? '120', 10),
14 },
15 workflow: {
16 ttlSeconds: parseInt(process.env.WORKFLOW_TTL_SECONDS ?? '86400', 10),
17 },
18});
19 
bash
1# .env.production
2OPENAI_API_KEY=sk-...
3OPENAI_MODEL=gpt-4o
4REDIS_HOST=your-redis-host
5REDIS_PORT=6379
6MAX_AGENT_ITERATIONS=10
7AGENT_TIMEOUT_SECONDS=120
8WORKFLOW_TTL_SECONDS=86400
9PORT=3000
10 

Dockerfile:

dockerfile
1FROM node:20-alpine AS builder
2WORKDIR /app
3COPY package*.json ./
4RUN npm ci
5COPY . .
6RUN npm run build
7 
8FROM node:20-alpine
9WORKDIR /app
10COPY package*.json ./
11RUN npm ci --omit=dev
12COPY --from=builder /app/dist ./dist
13CMD ["node", "dist/main.js"]
14 

CI/CD Pipeline

yaml
1# .github/workflows/deploy.yml
2name: Deploy
3 
4on:
5 push:
6 branches: [main]
7 
8jobs:
9 test:
10 runs-on: ubuntu-latest
11 services:
12 redis:
13 image: redis:7-alpine
14 ports: ['6379:6379']
15 steps:
16 - uses: actions/checkout@v4
17 - uses: actions/setup-node@v4
18 with:
19 node-version: '20'
20 cache: 'npm'
21 - run: npm ci
22 - run: npm run test:e2e
23 env:
24 OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
25 REDIS_HOST: localhost
26 
27 deploy:
28 needs: test
29 runs-on: ubuntu-latest
30 steps:
31 - uses: actions/checkout@v4
32 - name: Build Docker image
33 run: docker build -t ghcr.io/${{ github.repository }}:${{ github.sha }} .
34 - name: Push and deploy
35 run: |
36 echo "${{ secrets.GHCR_TOKEN }}" | docker login ghcr.io -u ${{ github.actor }} --password-stdin
37 docker push ghcr.io/${{ github.repository }}:${{ github.sha }}
38 # Trigger your deployment mechanism (Railway, Fly.io, ECS, etc.)
39

Monitoring Setup

NestJS works well with @willsoto/nestjs-prometheus for metrics exposure:

typescript
1// Instrument AgentService
2private async callWithMetrics(fn: () => Promise<unknown>, toolName: string) {
3 const start = Date.now();
4 try {
5 const result = await fn();
6 this.toolCallsTotal.inc({ tool: toolName, status: 'success' });
7 return result;
8 } catch (err) {
9 this.toolCallsTotal.inc({ tool: toolName, status: 'error' });
10 throw err;
11 } finally {
12 this.toolCallDuration.observe({ tool: toolName }, (Date.now() - start) / 1000);
13 }
14}
15 

Key metrics to expose at /metrics:

  • agent_iterations_total — counter by agent type
  • llm_request_duration_seconds — histogram by model
  • workflow_status_total — counter by type and status
  • tool_call_errors_total — counter by tool and error type

Next Steps

NestJS interceptors for request logging. Add a global LoggingInterceptor that logs every workflow request with correlation IDs. When a user reports an issue, you can trace the exact LLM calls that happened.

Configurable tool registry. Instead of hardcoding tools per processor, inject a ToolRegistry service that maps tool names to handlers. New tools register themselves via @Injectable() — no processor changes needed.

Workflow replay. Store the full message history alongside workflow state. If a workflow fails, replay from the last successful checkpoint rather than starting over — critical for 10+ step workflows.

Budget enforcement. Track OpenAI token usage per workflow using response.usage. Abort with AgentError('budget_exceeded', false) when a workflow exceeds its token budget.

Dead letter queue. Configure BullMQ's defaultJobOptions.removeOnFail: false and a separate queue consumer for failed jobs. Alert engineering when the DLQ depth exceeds a threshold.

Further Reading

Conclusion

NestJS provides the structural guardrails that agentic AI systems need at scale: dependency injection for clean separation between LLM clients, tool implementations, and workflow orchestration; BullMQ integration for queue-backed async execution; and the module system for organizing agent logic into independently testable units. The decorator-driven architecture means concerns like logging, validation, and error handling wrap agent workflows without polluting business logic.

The implementation path covered here — typed DTOs for workflow state, a tool registry pattern for extensible tool management, BullMQ processors for long-running workflows, and event emitters for step-level observability — gives you a production-ready foundation without over-engineering. Use class-validator on all incoming requests, Zod on all LLM outputs, and Redis-backed state for crash recovery. Run your workflow processor as a separate NestJS application from your API server so agent load does not compete with request handling. The patterns here scale from a single workflow to a multi-agent platform because NestJS's module system was designed for exactly this kind of incremental complexity growth.

FAQ

Need expert help?

Building with agentic AI?

I help teams ship production-grade systems. From architecture review to hands-on builds.

Muneer Puthiya Purayil

SaaS Architect & AI Systems Engineer. 10+ years shipping production infrastructure across fintech, automotive, e-commerce, and healthcare.

Engage

Start a
Conversation.

For teams building at scale: SaaS platforms, agentic AI systems, and enterprise mobile infrastructure. Scope and fit are evaluated before any engagement begins.

Limited availability · Q3 / Q4 2026