What is Agentic AI Workflows and why does it matter?

Agentic AI Workflows is a critical architectural pattern for modern software systems. It matters because it directly impacts scalability, maintainability, and team velocity in production environments.

How does NestJS compare for Agentic AI Workflows?

NestJS offers specific advantages for Agentic AI Workflows including strong typing, ecosystem support, and production-grade tooling. Its dependency injection system makes it straightforward to swap LLM providers, test agents in isolation, and add observability without touching business logic. The BullMQ integration handles the async job processing that agentic tasks require out of the box.

What are common mistakes with Agentic AI Workflows?

Common mistakes include premature optimization, insufficient observability, ignoring failure modes, and over-engineering the initial implementation. Start simple and iterate based on production data.

How to Build Agentic AI Workflows Using Nestjs

Prerequisites

Required Knowledge

This tutorial targets engineers who know TypeScript and have built REST APIs before. NestJS is a structured, opinionated framework — understanding its module/provider/controller triad is essential before layering agentic patterns on top. You should be comfortable with:

TypeScript 5.x — decorators, generics, async/await, type narrowing
NestJS fundamentals — modules, providers, dependency injection, guards, interceptors
OpenAI tool calling — how the model requests function invocations and processes results
Redis — basic key-value operations; we use it for workflow state persistence
Bull/BullMQ — NestJS's standard queue library for background job processing

Development Environment

Node.js 20 LTS or later. NestJS applications are CPU-bound during TypeScript compilation but IO-bound at runtime — Node 20's improved async performance matters at scale.

bash

node --version # v20.x.x or later npm --version # 10.x

Redis 7+ running locally:

bash

docker run -d -p 6379:6379 redis:7-alpine

Install the NestJS CLI globally:

bash

npm install -g @nestjs/cli

Set required environment variables:

bash

1export OPENAI_API_KEY="sk-..."

2export REDIS_HOST="localhost"

3export REDIS_PORT="6379"

Dependencies

The core packages we'll install:

1@nestjs/core ^10.4 # Framework core

2@nestjs/common ^10.4 # Decorators, guards, pipes

3@nestjs/bull ^10.2 # BullMQ integration for job queues

4@nestjs/config ^3.2 # Environment variable management

5openai ^1.40 # OpenAI SDK with tool calling

6bull ^4.16 # Underlying queue library

7ioredis ^5.4 # Redis client

8class-validator ^0.14 # Request validation

9class-transformer ^0.5 # DTO transformation

10@nestjs/event-emitter ^2.0 # Internal event bus for workflow steps

11ulid ^2.3 # Sortable IDs for workflow runs

Project Setup

Initialize Project

bash

nest new agentic-api cd agentic-api

Choose npm when prompted. The CLI generates a standard NestJS structure. We'll extend it:

1src/

2├── app.module.ts # Root module — imports all feature modules

3├── main.ts # Bootstrap with validation pipe

4├── config/

5│ └── configuration.ts # Typed config factory

6├── agent/

7│ ├── agent.module.ts

8│ ├── agent.service.ts # Core LLM invocation with tool loop

9│ └── tools/

10│ ├── tool.registry.ts

11│ └── web-search.tool.ts

12├── workflow/

13│ ├── workflow.module.ts

14│ ├── workflow.service.ts # State management + orchestration

15│ ├── workflow.controller.ts # HTTP endpoints

16│ ├── workflow.processor.ts # BullMQ job processor

17│ ├── dto/

18│ │ ├── create-workflow.dto.ts

19│ │ └── workflow-response.dto.ts

20│ └── entities/

21│ └── workflow.entity.ts

22└── state/

23 ├── state.module.ts

24 └── state.store.ts # Redis-backed state persistence

Configure Build Tools

NestJS uses SWC for fast compilation in development. Enable it in nest-cli.json:

json

2 "$schema": "https://json.schemastore.org/nest-cli",

3 "collection": "@nestjs/schematics",

4 "sourceRoot": "src",

5 "compilerOptions": {

6 "builder": {

7 "type": "swc",

8 "options": {

9 "swcOptions": {

10 "sourceMaps": true

11 }

12 }

13 },

14 "deleteOutDir": true

15 }

16}

Strict TypeScript settings catch agentic bugs at compile time. Update tsconfig.json:

json

2 "compilerOptions": {

3 "strict": true,

4 "strictNullChecks": true,

5 "noUncheckedIndexedAccess": true,

6 "target": "ES2022",

7 "experimentalDecorators": true,

8 "emitDecoratorMetadata": true

9 }

10}

Add Dependencies

bash

1npm install @nestjs/bull @nestjs/config @nestjs/event-emitter \

2 openai bull ioredis class-validator class-transformer ulid

4npm install -D @types/bull

Configure the app module to wire everything together:

typescript

1// src/app.module.ts

2import { Module } from '@nestjs/common';

3import { ConfigModule, ConfigService } from '@nestjs/config';

4import { BullModule } from '@nestjs/bull';

5import { EventEmitterModule } from '@nestjs/event-emitter';

6import configuration from './config/configuration';

7import { AgentModule } from './agent/agent.module';

8import { WorkflowModule } from './workflow/workflow.module';

9import { StateModule } from './state/state.module';

11@Module({

12 imports: [

13 ConfigModule.forRoot({

14 isGlobal: true,

15 load: [configuration],

16 }),

17 BullModule.forRootAsync({

18 imports: [ConfigModule],

19 useFactory: (config: ConfigService) => ({

20 redis: {

21 host: config.get<string>('redis.host'),

22 port: config.get<number>('redis.port'),

23 },

24 }),

25 inject: [ConfigService],

26 }),

27 EventEmitterModule.forRoot(),

28 StateModule,

29 AgentModule,

30 WorkflowModule,

31 ],

32})

33export class AppModule {}

Core Implementation

Building the Foundation

The state store handles workflow persistence in Redis. Each workflow run is a JSON-serialized entity with a typed status enum:

typescript

1// src/workflow/entities/workflow.entity.ts

2export enum WorkflowStatus {

3 PENDING = 'pending',

4 RUNNING = 'running',

5 AWAITING_INPUT = 'awaiting_input',

6 COMPLETED = 'completed',

7 FAILED = 'failed',

10export interface WorkflowStep {

11 type: string;

12 timestamp: string;

13 data: Record<string, unknown>;

14}

16export class WorkflowEntity {

17 workflowId: string;

18 workflowType: string;

19 inputData: Record<string, unknown>;

20 status: WorkflowStatus;

21 steps: WorkflowStep[];

22 output: Record<string, unknown> | null;

23 error: string | null;

24 createdAt: string;

25 updatedAt: string;

27 constructor(params: {

28 workflowId: string;

29 workflowType: string;

30 inputData: Record<string, unknown>;

31 }) {

32 Object.assign(this, params);

33 this.status = WorkflowStatus.PENDING;

34 this.steps = [];

35 this.output = null;

36 this.error = null;

37 const now = new Date().toISOString();

38 this.createdAt = now;

39 this.updatedAt = now;

40 }

41}

typescript

1// src/state/state.store.ts

2import { Injectable, Logger } from '@nestjs/common';

3import { ConfigService } from '@nestjs/config';

4import { InjectRedis } from '@nestjs/bull'; // Bull provides Redis access

5import Redis from 'ioredis';

6import { monotonicFactory } from 'ulid';

7import { WorkflowEntity, WorkflowStatus } from '../workflow/entities/workflow.entity';

9const ulid = monotonicFactory();

11@Injectable()

12export class StateStore {

13 private readonly logger = new Logger(StateStore.name);

14 private readonly ttl: number;

16 constructor(

17 @InjectRedis() private readonly redis: Redis,

18 private readonly config: ConfigService,

19 ) {

20 this.ttl = config.get<number>('workflow.ttlSeconds', 86400);

21 }

23 private key(workflowId: string): string {

24 return `workflow:${workflowId}`;

25 }

27 async create(

28 workflowType: string,

29 inputData: Record<string, unknown>,

30 ): Promise<WorkflowEntity> {

31 const workflowId = ulid();

32 const entity = new WorkflowEntity({ workflowId, workflowType, inputData });

33 await this.save(entity);

34 return entity;

35 }

37 async save(entity: WorkflowEntity): Promise<void> {

38 entity.updatedAt = new Date().toISOString();

39 await this.redis.setex(this.key(entity.workflowId), this.ttl, JSON.stringify(entity));

40 }

42 async get(workflowId: string): Promise<WorkflowEntity | null> {

43 const raw = await this.redis.get(this.key(workflowId));

44 if (!raw) return null;

45 return JSON.parse(raw) as WorkflowEntity;

46 }

48 async appendStep(workflowId: string, step: WorkflowEntity['steps'][number]): Promise<void> {

49 const entity = await this.get(workflowId);

50 if (!entity) {

51 this.logger.warn(`appendStep: workflow ${workflowId} not found`);

52 return;

53 }

54 entity.steps.push({ ...step, timestamp: new Date().toISOString() });

55 await this.save(entity);

56 }

57}

Adding Business Logic

The AgentService implements the tool-calling loop. It's framework-agnostic — it takes an OpenAI client, a system prompt, and a tool registry, and iterates until the model finishes or the iteration limit is hit:

typescript

1// src/agent/agent.service.ts

2import { Injectable, Logger } from '@nestjs/common';

3import { ConfigService } from '@nestjs/config';

4import OpenAI from 'openai';

5import type { ChatCompletionMessageParam } from 'openai/resources/chat';

7export interface ToolDefinition {

8 schema: OpenAI.Chat.ChatCompletionTool;

9 handler: (args: Record<string, unknown>) => Promise<unknown>;

10}

12export interface AgentRunResult {

13 content: string | null;

14 toolCallLog: Array<{

15 tool: string;

16 args: Record<string, unknown>;

17 result: unknown;

18 timestamp: string;

19 }>;

20 iterations: number;

21 finishReason: string;

22}

24@Injectable()

25export class AgentService {

26 private readonly logger = new Logger(AgentService.name);

27 private readonly client: OpenAI;

28 private readonly model: string;

29 private readonly maxIterations: number;

31 constructor(private readonly config: ConfigService) {

32 this.client = new OpenAI({

33 apiKey: config.getOrThrow<string>('openai.apiKey'),

34 });

35 this.model = config.get<string>('openai.model', 'gpt-4o');

36 this.maxIterations = config.get<number>('agent.maxIterations', 10);

37 }

39 async run(params: {

40 systemPrompt: string;

41 userMessage: string;

42 tools: ToolDefinition[];

43 context?: Record<string, unknown>;

44 maxIterations?: number;

45 }): Promise<AgentRunResult> {

46 const { systemPrompt, userMessage, tools, context, maxIterations } = params;

47 const limit = maxIterations ?? this.maxIterations;

48 const toolSchemas = tools.map((t) => t.schema);

49 const toolMap = new Map(tools.map((t) => [t.schema.function.name, t.handler]));

51 const messages: ChatCompletionMessageParam[] = [

52 { role: 'system', content: systemPrompt },

53 ];

55 if (context && Object.keys(context).length > 0) {

56 messages.push({

57 role: 'system',

58 content: `Context:\n${JSON.stringify(context, null, 2)}`,

59 });

60 }

62 messages.push({ role: 'user', content: userMessage });

64 const toolCallLog: AgentRunResult['toolCallLog'] = [];

65 let iterations = 0;

67 while (iterations < limit) {

68 iterations++;

70 const response = await this.client.chat.completions.create({

71 model: this.model,

72 messages,

73 tools: toolSchemas.length > 0 ? toolSchemas : undefined,

74 tool_choice: toolSchemas.length > 0 ? 'auto' : undefined,

75 temperature: 0.1,

76 });

78 const message = response.choices[0].message;

79 messages.push(message as ChatCompletionMessageParam);

81 if (!message.tool_calls || message.tool_calls.length === 0) {

82 return {

83 content: message.content,

84 toolCallLog,

85 iterations,

86 finishReason: response.choices[0].finish_reason,

87 };

88 }

90 for (const toolCall of message.tool_calls) {

91 const fnName = toolCall.function.name;

92 const fnArgs = JSON.parse(toolCall.function.arguments) as Record<string, unknown>;

93 const handler = toolMap.get(fnName);

95 let result: unknown;

96 if (!handler) {

97 result = { error: `Unknown tool: ${fnName}` };

98 } else {

99 try {

100 result = await handler(fnArgs);

101 } catch (err) {

102 result = { error: String(err), tool: fnName };

103 }

104 }

105

106 this.logger.debug(`Tool call: ${fnName}`, { args: fnArgs, result });

107 toolCallLog.push({

108 tool: fnName,

109 args: fnArgs,

110 result,

111 timestamp: new Date().toISOString(),

112 });

113

114 messages.push({

115 role: 'tool',

116 tool_call_id: toolCall.id,

117 content: JSON.stringify(result),

118 });

119 }

120 }

121

122 return {

123 content: 'Maximum iterations reached.',

124 toolCallLog,

125 iterations,

126 finishReason: 'max_iterations',

127 };

128 }

129}

130

Connecting Services

The WorkflowProcessor is a BullMQ job processor — it runs in the background and calls the agent service. This decouples HTTP request handling from the LLM work, which can take 30-120 seconds:

typescript

1// src/workflow/workflow.processor.ts

2import { Process, Processor } from '@nestjs/bull';

3import { Logger } from '@nestjs/common';

4import { EventEmitter2 } from '@nestjs/event-emitter';

5import { Job } from 'bull';

6import { AgentService, ToolDefinition } from '../agent/agent.service';

7import { StateStore } from '../state/state.store';

8import { WorkflowStatus } from './entities/workflow.entity';

10export interface WorkflowJobData {

11 workflowId: string;

12 workflowType: string;

13 inputData: Record<string, unknown>;

14}

16const RESEARCH_SYSTEM_PROMPT = `You are a rigorous research agent. Your job is to:

171. Search for authoritative information on the given topic

182. Synthesize findings into a structured, factual report

193. Cite your sources

21Do not speculate beyond what sources confirm. Finish when you have a comprehensive answer.`;

23@Processor('workflows')

24export class WorkflowProcessor {

25 private readonly logger = new Logger(WorkflowProcessor.name);

27 constructor(

28 private readonly agentService: AgentService,

29 private readonly stateStore: StateStore,

30 private readonly eventEmitter: EventEmitter2,

31 ) {}

33 @Process('research')

34 async handleResearch(job: Job<WorkflowJobData>): Promise<void> {

35 const { workflowId, inputData } = job.data;

36 const topic = inputData.topic as string;

38 this.logger.log(`Starting research workflow ${workflowId} for: ${topic}`);

40 const state = await this.stateStore.get(workflowId);

41 if (!state) throw new Error(`Workflow ${workflowId} not found in state store`);

43 state.status = WorkflowStatus.RUNNING;

44 await this.stateStore.save(state);

46 // Tool definitions — in production, inject a ToolRegistry

47 const tools: ToolDefinition[] = [

48 {

49 schema: {

50 type: 'function',

51 function: {

52 name: 'search_web',

53 description: 'Search the web for information on a topic',

54 parameters: {

55 type: 'object',

56 properties: {

57 query: { type: 'string', description: 'Search query' },

58 },

59 required: ['query'],

60 },

61 },

62 },

63 handler: async (args) => {

64 // Integrate with Brave, Exa, or SerpAPI here

65 return { results: [`Stub results for: ${args.query}`] };

66 },

67 },

68 ];

70 try {

71 const result = await this.agentService.run({

72 systemPrompt: RESEARCH_SYSTEM_PROMPT,

73 userMessage: `Research this topic thoroughly: ${topic}`,

74 tools,

75 maxIterations: 8,

76 });

78 const freshState = await this.stateStore.get(workflowId);

79 if (!freshState) return;

81 freshState.status = WorkflowStatus.COMPLETED;

82 freshState.output = {

83 report: result.content,

84 toolCallCount: result.toolCallLog.length,

85 iterations: result.iterations,

86 };

87 await this.stateStore.save(freshState);

89 this.eventEmitter.emit('workflow.completed', { workflowId });

90 this.logger.log(`Workflow ${workflowId} completed in ${result.iterations} iterations`);

91 } catch (error) {

92 const failedState = await this.stateStore.get(workflowId);

93 if (failedState) {

94 failedState.status = WorkflowStatus.FAILED;

95 failedState.error = String(error);

96 await this.stateStore.save(failedState);

97 }

98 this.eventEmitter.emit('workflow.failed', { workflowId, error: String(error) });

99 throw error; // Re-throw so Bull marks the job as failed

100 }

101 }

102}

103

Adding Features

Feature 1: Core Capability — HTTP API

The controller provides the public API surface. Workflow creation is non-blocking — it enqueues a job and returns the ID immediately:

typescript

1// src/workflow/workflow.controller.ts

2import {

3 Body, Controller, Get, Param, Post, Sse, MessageEvent,

4 NotFoundException, Logger,

5} from '@nestjs/common';

6import { WorkflowService } from './workflow.service';

7import { CreateWorkflowDto } from './dto/create-workflow.dto';

8import { Observable, interval, switchMap, takeWhile, map } from 'rxjs';

9import { WorkflowStatus } from './entities/workflow.entity';

11@Controller('api/v1/workflows')

12export class WorkflowController {

13 private readonly logger = new Logger(WorkflowController.name);

15 constructor(private readonly workflowService: WorkflowService) {}

17 @Post('research')

18 async createResearch(@Body() dto: CreateWorkflowDto) {

19 const workflowId = await this.workflowService.startResearch(dto.topic, dto.depth);

20 return { workflowId, status: 'pending' };

21 }

23 @Get(':id')

24 async getWorkflow(@Param('id') id: string) {

25 const workflow = await this.workflowService.getWorkflow(id);

26 if (!workflow) throw new NotFoundException(`Workflow ${id} not found`);

27 return workflow;

28 }

30 @Sse(':id/stream')

31 streamWorkflow(@Param('id') id: string): Observable<MessageEvent> {

32 return interval(1000).pipe(

33 switchMap(async () => {

34 const state = await this.workflowService.getWorkflow(id);

35 if (!state) throw new NotFoundException();

36 return state;

37 }),

38 takeWhile(

39 (state) =>

40 state.status !== WorkflowStatus.COMPLETED &&

41 state.status !== WorkflowStatus.FAILED,

42 true, // emit the terminal state before completing

43 ),

44 map((state) => ({

45 data: {

46 workflowId: state.workflowId,

47 status: state.status,

48 stepsCount: state.steps.length,

49 output: state.output,

50 error: state.error,

51 },

52 })),

53 );

54 }

55}

Feature 2: Extensions — Multi-Agent Orchestration

For complex tasks, decompose work across specialized agents running in parallel using Promise.allSettled:

typescript

1// src/workflow/workflow.service.ts (orchestration method)

2async startOrchestration(goal: string): Promise<string> {

3 const state = await this.stateStore.create('orchestrator', { goal });

4 const workflowId = state.workflowId;

6 // Run async — don't await

7 this.runOrchestration(workflowId, goal).catch((err) => {

8 this.logger.error(`Orchestration ${workflowId} failed`, err);

9 });

11 return workflowId;

12}

14private async runOrchestration(workflowId: string, goal: string): Promise<void> {

15 // Step 1: Plan

16 const planResult = await this.agentService.run({

17 systemPrompt: 'You are a task planner. Break the goal into 2-4 parallel research subtasks. Respond with a JSON array: [{"id": "1", "task": "..."}]',

18 userMessage: `Plan subtasks for: ${goal}`,

19 tools: [],

20 });

22 const jsonMatch = planResult.content?.match(/\[[\s\S]*\]/);

23 const subtasks: Array<{ id: string; task: string }> = jsonMatch

24 ? JSON.parse(jsonMatch[0])

25 : [{ id: '1', task: goal }];

27 // Step 2: Execute subtasks in parallel

28 const results = await Promise.allSettled(

29 subtasks.map(async (subtask) => {

30 const result = await this.agentService.run({

31 systemPrompt: 'You are a research specialist. Gather detailed information.',

32 userMessage: subtask.task,

33 tools: this.getResearchTools(),

34 });

35 await this.stateStore.appendStep(workflowId, {

36 type: 'subtask_complete',

37 data: { subtaskId: subtask.id, iterations: result.iterations },

38 timestamp: new Date().toISOString(),

39 });

40 return { id: subtask.id, content: result.content };

41 }),

42 );

44 // Step 3: Synthesize

45 const subtaskOutputs = results

46 .filter((r): r is PromiseFulfilledResult<{ id: string; content: string | null }> => r.status === 'fulfilled')

47 .map((r) => `Subtask ${r.value.id}:\n${r.value.content}`)

48 .join('\n\n---\n\n');

50 const synthesis = await this.agentService.run({

51 systemPrompt: 'Synthesize research into a final comprehensive response.',

52 userMessage: `Goal: ${goal}\n\nResearch outputs:\n${subtaskOutputs}`,

53 tools: [],

54 });

56 const finalState = await this.stateStore.get(workflowId);

57 if (!finalState) return;

58 finalState.status = WorkflowStatus.COMPLETED;

59 finalState.output = { result: synthesis.content, subtaskCount: subtasks.length };

60 await this.stateStore.save(finalState);

61}

63private getResearchTools(): ToolDefinition[] {

64 return [/* inject from ToolRegistry */];

65}

Feature 3: Polish — Validation and Guards

Use class-validator DTOs to fail fast on bad input before any LLM calls are made:

typescript

1// src/workflow/dto/create-workflow.dto.ts

2import { IsString, IsNotEmpty, IsIn, MaxLength, MinLength } from 'class-validator';

4export class CreateWorkflowDto {

5 @IsString()

6 @IsNotEmpty()

7 @MinLength(3)

8 @MaxLength(500)

9 topic: string;

11 @IsIn(['standard', 'deep'])

12 depth: 'standard' | 'deep' = 'standard';

13}

typescript

1// src/main.ts

2import { NestFactory } from '@nestjs/core';

3import { ValidationPipe, Logger } from '@nestjs/common';

4import { AppModule } from './app.module';

6async function bootstrap() {

7 const app = await NestFactory.create(AppModule);

9 app.useGlobalPipes(

10 new ValidationPipe({

11 whitelist: true, // Strip unknown properties

12 forbidNonWhitelisted: true,

13 transform: true, // Auto-transform to DTO types

14 transformOptions: { enableImplicitConversion: true },

15 }),

16 );

18 const port = process.env.PORT ?? 3000;

19 await app.listen(port);

20 Logger.log(`Agentic API running on port ${port}`);

21}

22bootstrap();

Need a second opinion on your AI systems architecture?

I run free 30-minute strategy calls for engineering teams tackling this exact problem.

Book a Free Call

Error Handling

Error Classification

NestJS exception filters let you handle agent-specific errors uniformly across all controllers:

typescript

1// src/agent/agent.exception.ts

2export class AgentError extends Error {

3 constructor(

4 message: string,

5 public readonly errorType: 'rate_limit' | 'timeout' | 'loop_detected' | 'max_iterations' | 'fatal',

6 public readonly retryable: boolean = false,

7 ) {

8 super(message);

9 this.name = 'AgentError';

10 }

11}

typescript

1// src/agent/agent-exception.filter.ts

2import { ExceptionFilter, Catch, ArgumentsHost, Logger } from '@nestjs/common';

3import { Response } from 'express';

4import { AgentError } from './agent.exception';

6const ERROR_MESSAGES: Record<string, string> = {

7 rate_limit: 'The AI service is temporarily busy. Please retry in 30 seconds.',

8 timeout: 'The agent timed out. Try a simpler request.',

9 loop_detected: 'The agent got stuck. Try rephrasing your request.',

10 max_iterations: 'Task too complex for automatic completion. Break it into smaller steps.',

11 fatal: 'An unexpected error occurred.',

12};

14@Catch(AgentError)

15export class AgentExceptionFilter implements ExceptionFilter {

16 private readonly logger = new Logger(AgentExceptionFilter.name);

18 catch(exception: AgentError, host: ArgumentsHost) {

19 const ctx = host.switchToHttp();

20 const response = ctx.getResponse<Response>();

22 this.logger.error(`AgentError [${exception.errorType}]: ${exception.message}`);

24 response.status(exception.retryable ? 503 : 422).json({

25 error: exception.errorType,

26 message: ERROR_MESSAGES[exception.errorType] ?? ERROR_MESSAGES.fatal,

27 retryable: exception.retryable,

28 });

29 }

30}

Recovery Strategies

BullMQ handles job retries with exponential backoff — configure it per job type:

typescript

1// src/workflow/workflow.service.ts

2import { InjectQueue } from '@nestjs/bull';

3import { Queue } from 'bull';

5@Injectable()

6export class WorkflowService {

7 constructor(

8 @InjectQueue('workflows') private readonly workflowQueue: Queue,

9 private readonly stateStore: StateStore,

10 ) {}

12 async startResearch(topic: string, depth = 'standard'): Promise<string> {

13 const state = await this.stateStore.create('research', { topic, depth });

15 await this.workflowQueue.add(

16 'research',

17 {

18 workflowId: state.workflowId,

19 workflowType: 'research',

20 inputData: { topic, depth },

21 },

22 {

23 attempts: 3,

24 backoff: { type: 'exponential', delay: 5000 },

25 removeOnComplete: 100, // Keep last 100 completed jobs

26 removeOnFail: 50,

27 },

28 );

30 return state.workflowId;

31 }

32}

User-Facing Messages

typescript

1import { AgentExceptionFilter } from './agent/agent-exception.filter';

3// In bootstrap():

4app.useGlobalFilters(new AgentExceptionFilter());

For workflow-level errors visible via the status endpoint, normalize error messages before storing them:

typescript

1// In WorkflowProcessor

2function normalizeError(err: unknown): string {

3 if (err instanceof AgentError) {

4 return ERROR_MESSAGES[err.errorType] ?? 'An error occurred.';

5 }

6 if (err instanceof Error && err.message.includes('rate limit')) {

7 return ERROR_MESSAGES.rate_limit;

8 }

9 return ERROR_MESSAGES.fatal;

10}

Deployment

Environment Configuration

typescript

1// src/config/configuration.ts

2export default () => ({

3 openai: {

4 apiKey: process.env.OPENAI_API_KEY,

5 model: process.env.OPENAI_MODEL ?? 'gpt-4o',

6 },

7 redis: {

8 host: process.env.REDIS_HOST ?? 'localhost',

9 port: parseInt(process.env.REDIS_PORT ?? '6379', 10),

10 },

11 agent: {

12 maxIterations: parseInt(process.env.MAX_AGENT_ITERATIONS ?? '10', 10),

13 timeoutSeconds: parseInt(process.env.AGENT_TIMEOUT_SECONDS ?? '120', 10),

14 },

15 workflow: {

16 ttlSeconds: parseInt(process.env.WORKFLOW_TTL_SECONDS ?? '86400', 10),

17 },

18});

bash

1# .env.production

2OPENAI_API_KEY=sk-...

3OPENAI_MODEL=gpt-4o

4REDIS_HOST=your-redis-host

5REDIS_PORT=6379

6MAX_AGENT_ITERATIONS=10

7AGENT_TIMEOUT_SECONDS=120

8WORKFLOW_TTL_SECONDS=86400

9PORT=3000

Dockerfile:

dockerfile

1FROM node:20-alpine AS builder

2WORKDIR /app

3COPY package*.json ./

4RUN npm ci

5COPY . .

6RUN npm run build

8FROM node:20-alpine

9WORKDIR /app

10COPY package*.json ./

11RUN npm ci --omit=dev

12COPY --from=builder /app/dist ./dist

13CMD ["node", "dist/main.js"]

CI/CD Pipeline

yaml

1# .github/workflows/deploy.yml

2name: Deploy

4on:

5 push:

6 branches: [main]

8jobs:

9 test:

10 runs-on: ubuntu-latest

11 services:

12 redis:

13 image: redis:7-alpine

14 ports: ['6379:6379']

15 steps:

16 - uses: actions/checkout@v4

17 - uses: actions/setup-node@v4

18 with:

19 node-version: '20'

20 cache: 'npm'

21 - run: npm ci

22 - run: npm run test:e2e

23 env:

24 OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}

25 REDIS_HOST: localhost

27 deploy:

28 needs: test

29 runs-on: ubuntu-latest

30 steps:

31 - uses: actions/checkout@v4

32 - name: Build Docker image

33 run: docker build -t ghcr.io/${{ github.repository }}:${{ github.sha }} .

34 - name: Push and deploy

35 run: |

36 echo "${{ secrets.GHCR_TOKEN }}" | docker login ghcr.io -u ${{ github.actor }} --password-stdin

37 docker push ghcr.io/${{ github.repository }}:${{ github.sha }}

38 # Trigger your deployment mechanism (Railway, Fly.io, ECS, etc.)

Monitoring Setup

NestJS works well with @willsoto/nestjs-prometheus for metrics exposure:

typescript

1// Instrument AgentService

2private async callWithMetrics(fn: () => Promise<unknown>, toolName: string) {

3 const start = Date.now();

4 try {

5 const result = await fn();

6 this.toolCallsTotal.inc({ tool: toolName, status: 'success' });

7 return result;

8 } catch (err) {

9 this.toolCallsTotal.inc({ tool: toolName, status: 'error' });

10 throw err;

11 } finally {

12 this.toolCallDuration.observe({ tool: toolName }, (Date.now() - start) / 1000);

13 }

14}

Key metrics to expose at /metrics:

agent_iterations_total — counter by agent type
llm_request_duration_seconds — histogram by model
workflow_status_total — counter by type and status
tool_call_errors_total — counter by tool and error type

Next Steps

Recommended Extensions

NestJS interceptors for request logging. Add a global LoggingInterceptor that logs every workflow request with correlation IDs. When a user reports an issue, you can trace the exact LLM calls that happened.

Configurable tool registry. Instead of hardcoding tools per processor, inject a ToolRegistry service that maps tool names to handlers. New tools register themselves via @Injectable() — no processor changes needed.

Workflow replay. Store the full message history alongside workflow state. If a workflow fails, replay from the last successful checkpoint rather than starting over — critical for 10+ step workflows.

Budget enforcement. Track OpenAI token usage per workflow using response.usage. Abort with AgentError('budget_exceeded', false) when a workflow exceeds its token budget.

Dead letter queue. Configure BullMQ's defaultJobOptions.removeOnFail: false and a separate queue consumer for failed jobs. Alert engineering when the DLQ depth exceeds a threshold.

Conclusion

NestJS provides the structural guardrails that agentic AI systems need at scale: dependency injection for clean separation between LLM clients, tool implementations, and workflow orchestration; BullMQ integration for queue-backed async execution; and the module system for organizing agent logic into independently testable units. The decorator-driven architecture means concerns like logging, validation, and error handling wrap agent workflows without polluting business logic.

The implementation path covered here — typed DTOs for workflow state, a tool registry pattern for extensible tool management, BullMQ processors for long-running workflows, and event emitters for step-level observability — gives you a production-ready foundation without over-engineering. Use class-validator on all incoming requests, Zod on all LLM outputs, and Redis-backed state for crash recovery. Run your workflow processor as a separate NestJS application from your API server so agent load does not compete with request handling. The patterns here scale from a single workflow to a multi-agent platform because NestJS's module system was designed for exactly this kind of incremental complexity growth.

FAQ

Need expert help?

Building with agentic AI?

I help teams ship production-grade systems. From architecture review to hands-on builds.

Book a Free Call Send a Brief

agentic-ai llm workflows orchestration nestjs tutorial

Muneer Puthiya Purayil

SaaS Architect & AI Systems Engineer. 10+ years shipping production infrastructure across fintech, automotive, e-commerce, and healthcare.

View Portfolio Book a Call

← Previous

How to Build Agentic AI Workflows Using Nestjs

Prerequisites

Required Knowledge

Development Environment

Dependencies

Project Setup

Initialize Project

Configure Build Tools

Add Dependencies

Core Implementation

Building the Foundation

Adding Business Logic

Connecting Services

Adding Features

Feature 1: Core Capability — HTTP API

Feature 2: Extensions — Multi-Agent Orchestration

Feature 3: Polish — Validation and Guards

Error Handling

Error Classification

Recovery Strategies

User-Facing Messages

Deployment

Environment Configuration

CI/CD Pipeline

Monitoring Setup

Next Steps

Recommended Extensions

Further Reading

Conclusion

FAQ

Building with agentic AI?

How to Build Agentic AI Workflows Using Fastapi

Agentic AI Workflows at Scale: Lessons from Production

Agentic AI Workflows Best Practices for High Scale Teams

How to Build Agentic AI Workflows Using Fastapi

Complete Guide to Agentic AI Workflows with Python

Start a
Conversation.

Prerequisites

Required Knowledge

Development Environment

Dependencies

Project Setup

Initialize Project

Configure Build Tools

Add Dependencies

Core Implementation

Building the Foundation

Adding Business Logic

Connecting Services

Adding Features

Feature 1: Core Capability — HTTP API

Feature 2: Extensions — Multi-Agent Orchestration

Feature 3: Polish — Validation and Guards

Error Handling

Error Classification

Recovery Strategies

User-Facing Messages

Deployment

Environment Configuration

CI/CD Pipeline

Monitoring Setup

Next Steps

Recommended Extensions

Further Reading

Conclusion

FAQ

Building with agentic AI?

How to Build Agentic AI Workflows Using Fastapi

Agentic AI Workflows at Scale: Lessons from Production

Agentic AI Workflows Best Practices for High Scale Teams

How to Build Agentic AI Workflows Using Fastapi

Complete Guide to Agentic AI Workflows with Python

Start aConversation.

Start a
Conversation.