Back to Journal
AI Architecture

Complete Guide to Agentic AI Workflows with Typescript

A comprehensive guide to implementing Agentic AI Workflows using Typescript, covering architecture, code examples, and production-ready patterns.

Muneer Puthiya Purayil 12 min read

Introduction

Why This Matters

TypeScript has quietly become the most practical language for production agentic AI systems in web-native and full-stack teams. The LangGraph.js port is now feature-complete with the Python version. The Anthropic and OpenAI SDKs are first-class TypeScript packages with full type definitions. Vercel's AI SDK has matured into a production-grade abstraction for streaming, tool calling, and multi-step agent loops. And the type system — the actual TypeScript type system, not the loose dynamic typing of plain JavaScript — gives you compile-time guarantees about workflow state that Python only provides at runtime with Pydantic.

For teams already building backend services in Node.js, NestJS, or Next.js, adding agentic AI in TypeScript is a smaller architectural leap than switching languages. The deployment infrastructure, the CI/CD pipeline, the monitoring stack — all of it carries over. You add LLM logic; you do not add a new language runtime.

That said, TypeScript's agentic AI ecosystem is still younger than Python's, and the ecosystem fragmentation is higher. This guide cuts through the noise: here is what works in production, here is the code, and here is how to harden it.

Who This Is For

This guide targets TypeScript engineers — backend (Node.js, NestJS, Express), full-stack (Next.js, Remix), or platform — who are adding agentic AI features to existing applications or building new AI-native products. Experience with async/await and generics is assumed. Familiarity with Zod for runtime type validation will help.

If you are a Python-first engineer evaluating whether TypeScript is viable for agentic AI, this guide will give you an honest answer with concrete code to evaluate.

What You Will Learn

  • Why TypeScript's type system is a genuine advantage for agentic workflow state management
  • A complete project structure for a production agentic workflow in TypeScript
  • Typed tool definitions, structured output validation with Zod, and retry logic
  • LangGraph.js for stateful multi-step workflows
  • Vercel AI SDK patterns for streaming agent responses in Next.js
  • Testing strategy: mocking LLM calls, testing tool implementations, integration test patterns

Core Concepts

Key Terminology

Agent: An LLM-powered system that autonomously decides which actions to take. The LLM receives a goal and context; it chooses to either respond directly or invoke a tool.

Tool: A typed function the LLM can call. In TypeScript, tools are defined with a name, description, and Zod schema for parameters. The LLM sees the schema as its calling interface.

Structured output: LLM response constrained to a TypeScript type or Zod schema. Essential for consuming LLM output in application code without string parsing.

Workflow state: A typed object that accumulates information across workflow steps. In LangGraph.js, this is your Annotation type; in custom implementations, it is a plain TypeScript interface.

Trace: A record of every LLM call and tool invocation in a single workflow run. Essential for debugging non-deterministic behavior.

Tool call: The JSON format LLMs use to request a tool invocation. The LLM generates { name: "search_kb", args: { query: "..." } }; your runtime executes the function and returns the result.

Mental Models

Type safety does not eliminate non-determinism. TypeScript gives you compile-time guarantees about the shape of your workflow state. It cannot guarantee what the LLM will put in that shape. A field typed as string will always be a string — but it might be the wrong string. Validation (Zod .parse()) at the LLM output boundary handles this.

Tools are contracts, not implementations. When you define a tool for the LLM, you are writing a contract: here is what you can call and what parameters it expects. The LLM calls against this contract; your implementation must honor it. Keep the contract (tool definition) stable; change the implementation freely.

Async is not optional. Every LLM call is an async operation. Every tool that touches an external service is async. TypeScript's async/await and Promise types are not stylistic choices for agentic code — they are requirements. A synchronous workflow step that calls an LLM will block the Node.js event loop.

Foundational Principles

  1. Use Zod at every LLM output boundary. TypeScript types are erased at runtime. The only guarantee about an LLM's JSON output is that it is... JSON. Zod .parse() validates and narrows the type at runtime, giving you the type guarantee TypeScript cannot make across the LLM boundary.

  2. Immutable state updates. Pass state into steps; return new state. Never mutate state objects in place. This makes workflow debugging tractable — each step's input and output is a discrete snapshot.

  3. Typed errors, not thrown strings. Define an AgentError discriminated union type. Return errors as values where possible; only throw for genuinely exceptional conditions (programmer error, system corruption).

  4. Separate tool definitions from tool implementations. The tool definition (name, description, Zod schema) goes next to the LLM call. The implementation (the actual business logic) goes in a testable function. They meet at one line of glue code.


Architecture Overview

High-Level Design

1┌─────────────────────────────────────────────┐
2│ API Layer (NestJS / Next.js) │
3│ POST /api/workflow → returns run_id │
4│ GET /api/workflow/:id → returns status │
5└─────────────────────┬───────────────────────┘
6
7┌─────────────────────▼───────────────────────┐
8│ Workflow Runner (BullMQ) │
9│ Picks up jobs, enforces timeout/budget │
10│ Publishes result to Redis on completion │
11└─────────────────────┬───────────────────────┘
12
13┌─────────────────────▼───────────────────────┐
14│ Agent + Tools (LangGraph.js) │
15│ StateGraph with typed Annotation │
16│ Tool definitions (Zod schemas) │
17│ LLM client (Anthropic/OpenAI SDK) │
18└─────────────────────┬───────────────────────┘
19
20┌─────────────────────▼───────────────────────┐
21│ State & Observability │
22│ Typed workflow state, structured logs │
23│ LangFuse for traces, token tracking │
24└─────────────────────────────────────────────┘
25 

Component Breakdown

WorkflowState interface: Immutable typed state object. All workflow data — inputs, tool results, intermediate reasoning, final output — lives here. Passed through LangGraph.js nodes.

Tool registry: A Map<string, ToolDefinition & { execute: (...args) => Promise<string> }>. Separates the LLM-facing definition (Zod schema, description) from the implementation.

LLM client wrapper: Thin class wrapping the provider SDK. Adds retry with exponential backoff, token spend tracking, and trace correlation. Never called directly by business logic.

LangGraph.js StateGraph: The control flow. Nodes are TypeScript functions that transform WorkflowState. Edges are routing functions. The graph is compiled once at startup.

BullMQ worker: For workflows exceeding 10 seconds, a BullMQ worker picks up the job, runs the graph, and writes the result. The API route returns a job ID immediately.

Data Flow

typescript
1// TypeScript type-level view of data flow
2type WorkflowInput = { userId: string; userInput: string };
3type WorkflowResult = { runId: string; output: WorkflowOutput } | { runId: string; error: AgentError };
4 
5// 1. API receives input, creates run record, enqueues job
6async function submitWorkflow(input: WorkflowInput): Promise<string>;
7 
8// 2. Worker picks up job, initializes typed state
9const initialState: WorkflowState = { runId, ...input, messages: [], tokenSpend: 0 };
10 
11// 3. LangGraph executes nodes, each returning new state
12type AgentNode = (state: WorkflowState) => Promise<Partial<WorkflowState>>;
13 
14// 4. Zod validates final output before state update
15const parsed = WorkflowOutputSchema.safeParse(rawOutput);
16 
17// 5. Result written to DB; client polls or receives webhook
18 

Implementation Steps

Step 1: Project Setup

bash
1mkdir agent-ts && cd agent-ts
2npm init -y
3npm install \
4 @langchain/langgraph \
5 @langchain/anthropic \
6 @langchain/core \
7 @anthropic-ai/sdk \
8 openai \
9 zod \
10 bullmq \
11 ioredis \
12 langfuse \
13 pino \
14 dotenv
15 
16npm install -D typescript @types/node tsx
17npx tsc --init
18 
1agent-ts/
2├── src/
3│ ├── agent/
4│ │ ├── state.ts # WorkflowState type + Annotation
5│ │ ├── tools.ts # Tool definitions + implementations
6│ │ ├── llm.ts # LLM client wrapper
7│ │ ├── graph.ts # LangGraph StateGraph
8│ │ └── prompts/
9│ │ └── system.md
10│ ├── worker/
11│ │ └── workflow.worker.ts
12│ ├── api/
13│ │ └── routes.ts
14│ └── index.ts
15├── tsconfig.json
16└── .env
17 
typescript
1// src/agent/state.ts
2import { Annotation } from '@langchain/langgraph';
3import type { BaseMessage } from '@langchain/core/messages';
4 
5export interface WorkflowOutput {
6 answer: string;
7 confidence: number;
8 sources: string[];
9}
10 
11export const WorkflowAnnotation = Annotation.Root({
12 runId: Annotation<string>(),
13 userId: Annotation<string>(),
14 userInput: Annotation<string>(),
15 messages: Annotation<BaseMessage[]>({
16 reducer: (existing, update) => existing.concat(update),
17 }),
18 toolResults: Annotation<Record<string, string>>({
19 reducer: (existing, update) => ({ ...existing, ...update }),
20 }),
21 finalOutput: Annotation<WorkflowOutput | null>({ default: () => null }),
22 tokenSpend: Annotation<number>({
23 reducer: (existing, update) => existing + update,
24 default: () => 0,
25 }),
26 error: Annotation<string | null>({ default: () => null }),
27});
28 
29export type WorkflowState = typeof WorkflowAnnotation.State;
30 

Step 2: Core Logic

typescript
1// src/agent/tools.ts
2import { z } from 'zod';
3import { tool } from '@langchain/core/tools';
4 
5// Tool definition + implementation in one place
6export const searchKnowledgeBase = tool(
7 async ({ query }: { query: string }): Promise<string> => {
8 const response = await fetch(`${process.env.KB_URL}/search`, {
9 method: 'POST',
10 headers: { 'Content-Type': 'application/json' },
11 body: JSON.stringify({ query, limit: 3 }),
12 signal: AbortSignal.timeout(10_000),
13 });
14
15 if (!response.ok) {
16 return `Search failed: HTTP ${response.status}`;
17 }
18
19 const results = await response.json() as Array<{ text: string }>;
20 if (results.length === 0) return 'No results found.';
21 return results.map(r => r.text).join('\n\n---\n\n');
22 },
23 {
24 name: 'search_knowledge_base',
25 description: 'Search the internal knowledge base for relevant information.',
26 schema: z.object({
27 query: z.string().describe('Natural language search query'),
28 }),
29 }
30);
31 
32export const createSupportTicket = tool(
33 async ({ title, description, priority }: {
34 title: string;
35 description: string;
36 priority: 'low' | 'medium' | 'high' | 'critical';
37 }): Promise<string> => {
38 const response = await fetch(`${process.env.TICKETING_URL}/tickets`, {
39 method: 'POST',
40 headers: { 'Content-Type': 'application/json' },
41 body: JSON.stringify({ title, description, priority }),
42 signal: AbortSignal.timeout(10_000),
43 });
44
45 if (!response.ok) {
46 return `Ticket creation failed: HTTP ${response.status}`;
47 }
48
49 const ticket = await response.json() as { id: string };
50 return `Ticket created: ${ticket.id}`;
51 },
52 {
53 name: 'create_support_ticket',
54 description: 'Create a support ticket for user-reported issues.',
55 schema: z.object({
56 title: z.string().max(100).describe('Brief ticket title'),
57 description: z.string().describe('Detailed description'),
58 priority: z.enum(['low', 'medium', 'high', 'critical']),
59 }),
60 }
61);
62 
63export const TOOLS = [searchKnowledgeBase, createSupportTicket];
64 
typescript
1// src/agent/llm.ts
2import { ChatAnthropic } from '@langchain/anthropic';
3import { TOOLS } from './tools';
4 
5export function createLLM() {
6 const llm = new ChatAnthropic({
7 model: process.env.LLM_MODEL ?? 'claude-3-5-sonnet-20241022',
8 temperature: 0,
9 maxTokens: 4096,
10 // Timeout in ms — never rely on default
11 timeout: Number(process.env.LLM_TIMEOUT_MS ?? 30_000),
12 });
13
14 return llm.bindTools(TOOLS);
15}
16 

Step 3: Integration

typescript
1// src/agent/graph.ts
2import { StateGraph, END } from '@langchain/langgraph';
3import { ToolNode } from '@langchain/langgraph/prebuilt';
4import { HumanMessage, SystemMessage } from '@langchain/core/messages';
5import { WorkflowAnnotation, WorkflowState } from './state';
6import { createLLM } from './llm';
7import { TOOLS } from './tools';
8import { readFileSync } from 'fs';
9import { join } from 'path';
10import { z } from 'zod';
11 
12const SYSTEM_PROMPT = readFileSync(
13 join(__dirname, 'prompts/system.md'),
14 'utf-8'
15);
16 
17const WorkflowOutputSchema = z.object({
18 answer: z.string(),
19 confidence: z.number().min(0).max(1),
20 sources: z.array(z.string()),
21});
22 
23const llm = createLLM();
24 
25async function agentNode(state: WorkflowState): Promise<Partial<WorkflowState>> {
26 const messages = [
27 new SystemMessage(SYSTEM_PROMPT),
28 ...state.messages,
29 new HumanMessage(state.userInput),
30 ];
31
32 const response = await llm.invoke(messages);
33 const tokenSpend = (response.usage_metadata?.total_tokens ?? 0);
34
35 return {
36 messages: [response],
37 tokenSpend,
38 };
39}
40 
41async function outputNode(state: WorkflowState): Promise<Partial<WorkflowState>> {
42 // Parse the last assistant message as structured output
43 const lastMessage = state.messages[state.messages.length - 1];
44 const content = typeof lastMessage.content === 'string'
45 ? lastMessage.content
46 : JSON.stringify(lastMessage.content);
47
48 try {
49 const parsed = WorkflowOutputSchema.parse(JSON.parse(content));
50 return { finalOutput: parsed };
51 } catch {
52 // Fallback: wrap free-text response in the expected structure
53 return {
54 finalOutput: {
55 answer: content,
56 confidence: 0.7,
57 sources: [],
58 }
59 };
60 }
61}
62 
63function shouldContinue(state: WorkflowState): 'tools' | 'output' {
64 const lastMessage = state.messages[state.messages.length - 1];
65 const hasToolCalls = 'tool_calls' in lastMessage &&
66 Array.isArray((lastMessage as any).tool_calls) &&
67 (lastMessage as any).tool_calls.length > 0;
68 return hasToolCalls ? 'tools' : 'output';
69}
70 
71const toolNode = new ToolNode(TOOLS);
72 
73const graph = new StateGraph(WorkflowAnnotation)
74 .addNode('agent', agentNode)
75 .addNode('tools', toolNode)
76 .addNode('output', outputNode)
77 .addEdge('__start__', 'agent')
78 .addConditionalEdges('agent', shouldContinue)
79 .addEdge('tools', 'agent')
80 .addEdge('output', END);
81 
82export const workflow = graph.compile();
83 

Need a second opinion on your AI systems architecture?

I run free 30-minute strategy calls for engineering teams tackling this exact problem.

Book a Free Call

Code Examples

Basic Implementation

A minimal typed agent without a framework — direct SDK usage with tool calling:

typescript
1import Anthropic from '@anthropic-ai/sdk';
2import { z } from 'zod';
3 
4const client = new Anthropic();
5 
6const SentimentSchema = z.object({
7 sentiment: z.enum(['positive', 'negative', 'neutral']),
8 confidence: z.number().min(0).max(1),
9 keyPhrases: z.array(z.string()),
10});
11 
12type SentimentResult = z.infer<typeof SentimentSchema>;
13 
14async function analyzeSentiment(text: string): Promise<SentimentResult> {
15 const response = await client.messages.create({
16 model: 'claude-3-5-haiku-20241022',
17 max_tokens: 512,
18 system: `Analyze the sentiment of the provided text.
19Respond with JSON matching exactly:
20{"sentiment":"positive|negative|neutral","confidence":0.0-1.0,"keyPhrases":["..."]}
21Respond ONLY with the JSON object.`,
22 messages: [{ role: 'user', content: text }],
23 });
24 
25 const raw = response.content[0];
26 if (raw.type !== 'text') throw new Error('Unexpected response type');
27
28 return SentimentSchema.parse(JSON.parse(raw.text));
29}
30 

Advanced Patterns

Multi-step agent with typed state transitions:

typescript
1import { StateGraph, Annotation, END } from '@langchain/langgraph';
2 
3const ResearchAnnotation = Annotation.Root({
4 query: Annotation<string>(),
5 searchResults: Annotation<string[]>({
6 reducer: (a, b) => [...a, ...b],
7 default: () => [],
8 }),
9 synthesis: Annotation<string | null>({ default: () => null }),
10 currentStep: Annotation<'search' | 'synthesize' | 'done'>({
11 default: () => 'search'
12 }),
13});
14 
15type ResearchState = typeof ResearchAnnotation.State;
16 
17// Each node is a pure async function on typed state
18async function searchNode(state: ResearchState): Promise<Partial<ResearchState>> {
19 const results = await performSearch(state.query);
20 return {
21 searchResults: results,
22 currentStep: 'synthesize',
23 };
24}
25 
26async function synthesizeNode(state: ResearchState): Promise<Partial<ResearchState>> {
27 const response = await llm.invoke([
28 new SystemMessage('Synthesize the search results into a clear answer.'),
29 new HumanMessage(
30 `Query: ${state.query}\n\nResults:\n${state.searchResults.join('\n\n')}`
31 ),
32 ]);
33 return {
34 synthesis: response.content as string,
35 currentStep: 'done',
36 };
37}
38 
39function routeStep(state: ResearchState): string {
40 return state.currentStep === 'done' ? END : state.currentStep;
41}
42 
43const researchGraph = new StateGraph(ResearchAnnotation)
44 .addNode('search', searchNode)
45 .addNode('synthesize', synthesizeNode)
46 .addEdge('__start__', 'search')
47 .addConditionalEdges('search', routeStep)
48 .addConditionalEdges('synthesize', routeStep)
49 .compile();
50 

Parallel tool execution with typed results:

typescript
1import pLimit from 'p-limit';
2 
3interface ToolCall {
4 id: string;
5 name: string;
6 args: Record<string, unknown>;
7}
8 
9interface ToolResult {
10 toolCallId: string;
11 content: string;
12}
13 
14const TOOL_MAP = new Map(TOOLS.map(t => [t.name, t]));
15const limit = pLimit(5); // max 5 concurrent tool calls
16 
17async function executeToolCallsParallel(toolCalls: ToolCall[]): Promise<ToolResult[]> {
18 return Promise.all(
19 toolCalls.map(call =>
20 limit(async (): Promise<ToolResult> => {
21 const tool = TOOL_MAP.get(call.name);
22 if (!tool) {
23 return { toolCallId: call.id, content: `Unknown tool: ${call.name}` };
24 }
25
26 try {
27 const result = await (tool as any).invoke(call.args);
28 return { toolCallId: call.id, content: String(result) };
29 } catch (err) {
30 return {
31 toolCallId: call.id,
32 content: `Tool error: ${err instanceof Error ? err.message : 'unknown'}`
33 };
34 }
35 })
36 )
37 );
38}
39 

Production Hardening

typescript
1// src/worker/workflow.worker.ts
2import { Worker, Job } from 'bullmq';
3import { workflow } from '../agent/graph';
4import { WorkflowState } from '../agent/state';
5import { pino } from 'pino';
6import { randomUUID } from 'crypto';
7 
8const logger = pino({ level: process.env.LOG_LEVEL ?? 'info' });
9const MAX_TOKEN_BUDGET = Number(process.env.MAX_TOKEN_BUDGET ?? 50_000);
10const WORKFLOW_TIMEOUT_MS = Number(process.env.WORKFLOW_TIMEOUT_MS ?? 120_000);
11 
12interface WorkflowJobData {
13 userId: string;
14 userInput: string;
15}
16 
17async function processWorkflow(job: Job<WorkflowJobData>): Promise<unknown> {
18 const runId = randomUUID();
19 const startMs = Date.now();
20
21 logger.info({ runId, jobId: job.id, userId: job.data.userId }, 'workflow_start');
22
23 const initialState: Partial<WorkflowState> = {
24 runId,
25 userId: job.data.userId,
26 userInput: job.data.userInput,
27 };
28
29 let result: WorkflowState;
30
31 try {
32 result = await Promise.race([
33 workflow.invoke(initialState),
34 new Promise<never>((_, reject) =>
35 setTimeout(() => reject(new Error('WORKFLOW_TIMEOUT')), WORKFLOW_TIMEOUT_MS)
36 ),
37 ]) as WorkflowState;
38 } catch (err) {
39 const errorMessage = err instanceof Error ? err.message : 'unknown_error';
40 logger.error({ runId, error: errorMessage }, 'workflow_failed');
41 throw err; // BullMQ will retry based on job options
42 }
43
44 const durationMs = Date.now() - startMs;
45
46 if (result.tokenSpend > MAX_TOKEN_BUDGET) {
47 logger.warn(
48 { runId, tokenSpend: result.tokenSpend, budget: MAX_TOKEN_BUDGET },
49 'token_budget_exceeded'
50 );
51 }
52
53 logger.info(
54 { runId, durationMs, tokenSpend: result.tokenSpend, hasError: !!result.error },
55 'workflow_complete'
56 );
57
58 return {
59 runId,
60 output: result.finalOutput,
61 error: result.error,
62 meta: { durationMs, tokenSpend: result.tokenSpend },
63 };
64}
65 
66const worker = new Worker<WorkflowJobData>(
67 'agent-workflows',
68 processWorkflow,
69 {
70 connection: { host: process.env.REDIS_HOST, port: Number(process.env.REDIS_PORT) },
71 concurrency: Number(process.env.WORKER_CONCURRENCY ?? 5),
72 }
73);
74 
75worker.on('failed', (job, err) => {
76 logger.error({ jobId: job?.id, error: err.message }, 'job_failed');
77});
78 

Performance Considerations

Latency Optimization

Streaming for user-facing responses. When displaying agent output in real-time, use Vercel AI SDK's streamText or LangChain's astream_events. The perceived latency drops significantly when users see tokens appearing instead of waiting for a complete response.

typescript
1// Next.js App Router route with streaming (Vercel AI SDK)
2import { streamText } from 'ai';
3import { anthropic } from '@ai-sdk/anthropic';
4 
5export async function POST(req: Request) {
6 const { messages } = await req.json();
7
8 const result = streamText({
9 model: anthropic('claude-3-5-sonnet-20241022'),
10 system: SYSTEM_PROMPT,
11 messages,
12 tools: {
13 search: {
14 description: 'Search the knowledge base',
15 parameters: z.object({ query: z.string() }),
16 execute: async ({ query }) => searchKnowledgeBase(query),
17 },
18 },
19 maxSteps: 10, // max tool call iterations
20 });
21
22 return result.toDataStreamResponse();
23}
24 

Parallel tool execution. The ToolNode in LangGraph.js executes tool calls sequentially by default. Replace it with a custom parallel implementation using Promise.all when tool calls in a single turn are independent.

Model routing per step. Route simpler intermediate steps (classification, extraction from short text) to claude-3-5-haiku-20241022 or gpt-4o-mini. Reserve Sonnet/GPT-4o for steps requiring deep reasoning. This reduces p50 latency by 40–60% and cost by a similar margin for mixed workloads.

Memory Management

Node.js single-process LLM workers can accumulate memory through context window growth. Key mitigations:

Truncate tool results. Cap the string length of tool results before adding to the message history. A tool that returns a 500KB JSON blob will bloat every subsequent LLM call's context.

typescript
1const MAX_TOOL_RESULT_LENGTH = 4000;
2 
3function truncateToolResult(result: string): string {
4 if (result.length <= MAX_TOOL_RESULT_LENGTH) return result;
5 return (
6 result.slice(0, MAX_TOOL_RESULT_LENGTH) +
7 `\n\n[... truncated ${result.length - MAX_TOOL_RESULT_LENGTH} chars]`
8 );
9}
10 

Use separate BullMQ workers per concurrency group. Long workflows on the same worker process as short workflows create head-of-line blocking. Separate queues for fast (< 10s) and standard (> 10s) workflows allow independent scaling.

Set Node.js --max-old-space-size explicitly. Default heap size in Node.js is environment-dependent and often lower than needed for concurrent agentic workflows with large contexts. Set --max-old-space-size=4096 (4GB) explicitly for worker processes.

Load Testing

typescript
1// k6 load test — agent workflow endpoint
2import http from 'k6/http';
3import { check, sleep } from 'k6';
4 
5export const options = {
6 scenarios: {
7 steady_load: {
8 executor: 'ramping-vus',
9 startVUs: 1,
10 stages: [
11 { duration: '2m', target: 10 },
12 { duration: '5m', target: 20 },
13 { duration: '2m', target: 0 },
14 ],
15 },
16 },
17 thresholds: {
18 http_req_duration: ['p(95)<5000'], // 95% under 5s (submission, not completion)
19 http_req_failed: ['rate<0.01'], // <1% HTTP errors
20 },
21};
22 
23const TEST_INPUTS = [
24 'What is the return policy?',
25 'Create a ticket for payment processing failure',
26 'How do I upgrade my subscription?',
27];
28 
29export default function () {
30 const payload = JSON.stringify({
31 userInput: TEST_INPUTS[Math.floor(Math.random() * TEST_INPUTS.length)],
32 });
33
34 const response = http.post(
35 `${__ENV.BASE_URL}/api/workflow`,
36 payload,
37 { headers: { 'Content-Type': 'application/json' } }
38 );
39
40 check(response, {
41 'status 202': (r) => r.status === 202,
42 'has run_id': (r) => JSON.parse(r.body as string).runId !== undefined,
43 });
44
45 sleep(1);
46}
47 

Testing Strategy

Unit Tests

Test tool implementations independently — these should be fast, deterministic, and have no LLM dependency:

typescript
1// src/agent/tools.test.ts (Vitest)
2import { describe, it, expect, vi, beforeEach } from 'vitest';
3import { searchKnowledgeBase } from './tools';
4 
5describe('searchKnowledgeBase', () => {
6 beforeEach(() => {
7 vi.stubGlobal('fetch', vi.fn());
8 });
9
10 it('returns formatted results', async () => {
11 vi.mocked(fetch).mockResolvedValueOnce({
12 ok: true,
13 json: async () => [{ text: 'Policy text here' }, { text: 'More context' }],
14 } as Response);
15
16 const result = await searchKnowledgeBase.invoke({ query: 'return policy' });
17 expect(result).toContain('Policy text here');
18 expect(result).toContain('More context');
19 });
20
21 it('handles empty results gracefully', async () => {
22 vi.mocked(fetch).mockResolvedValueOnce({
23 ok: true,
24 json: async () => [],
25 } as Response);
26
27 const result = await searchKnowledgeBase.invoke({ query: 'nonexistent topic' });
28 expect(result).toBe('No results found.');
29 });
30
31 it('handles HTTP errors', async () => {
32 vi.mocked(fetch).mockResolvedValueOnce({
33 ok: false,
34 status: 503,
35 } as Response);
36
37 const result = await searchKnowledgeBase.invoke({ query: 'test' });
38 expect(result).toMatch(/failed.*503/i);
39 });
40});
41 

Integration Tests

Test the graph with mocked LLM calls:

typescript
1// src/agent/graph.test.ts
2import { describe, it, expect, vi } from 'vitest';
3import { workflow } from './graph';
4 
5const MOCK_FINAL_RESPONSE = {
6 content: JSON.stringify({
7 answer: 'Returns are accepted within 30 days.',
8 confidence: 0.95,
9 sources: ['knowledge-base-chunk-42'],
10 }),
11 tool_calls: [],
12 usage_metadata: { total_tokens: 350 },
13};
14 
15describe('workflow graph', () => {
16 it('completes successfully with mocked LLM', async () => {
17 // Mock the LLM at the module level
18 vi.mock('@langchain/anthropic', () => ({
19 ChatAnthropic: vi.fn().mockImplementation(() => ({
20 bindTools: vi.fn().mockReturnThis(),
21 invoke: vi.fn().mockResolvedValue(MOCK_FINAL_RESPONSE),
22 })),
23 }));
24
25 const result = await workflow.invoke({
26 runId: 'test-run-1',
27 userId: 'user-123',
28 userInput: 'What is the return policy?',
29 });
30
31 expect(result.finalOutput).not.toBeNull();
32 expect(result.finalOutput?.answer).toContain('30 days');
33 expect(result.error).toBeNull();
34 });
35
36 it('handles tool calls in the loop', async () => {
37 // First call returns a tool call; second returns final response
38 const mockInvoke = vi.fn()
39 .mockResolvedValueOnce({
40 content: '',
41 tool_calls: [{ id: 'tc1', name: 'search_knowledge_base', args: { query: 'returns' } }],
42 usage_metadata: { total_tokens: 200 },
43 })
44 .mockResolvedValueOnce(MOCK_FINAL_RESPONSE);
45
46 vi.mock('@langchain/anthropic', () => ({
47 ChatAnthropic: vi.fn().mockImplementation(() => ({
48 bindTools: vi.fn().mockReturnThis(),
49 invoke: mockInvoke,
50 })),
51 }));
52
53 const result = await workflow.invoke({
54 runId: 'test-run-2',
55 userId: 'user-123',
56 userInput: 'What is the return policy?',
57 });
58
59 expect(mockInvoke).toHaveBeenCalledTimes(2);
60 expect(result.finalOutput).not.toBeNull();
61 });
62});
63 

End-to-End Validation

For E2E validation, record real LLM interactions as fixtures and replay them:

typescript
1// scripts/record-cassette.ts — run once, commit the output
2import { workflow } from '../src/agent/graph';
3import { writeFileSync } from 'fs';
4 
5// Set up an interceptor that records all HTTP calls to the LLM API
6// Then replay from the recorded fixture in tests
7 
8async function record() {
9 const calls: unknown[] = [];
10 const originalFetch = global.fetch;
11
12 global.fetch = async (url, init) => {
13 const response = await originalFetch(url, init);
14 const body = await response.clone().json();
15 calls.push({ url, requestBody: init?.body, responseBody: body });
16 return response;
17 };
18
19 await workflow.invoke({
20 runId: 'cassette-1',
21 userId: 'test-user',
22 userInput: 'What is the cancellation policy?',
23 });
24
25 writeFileSync('src/agent/__fixtures__/cancellation-query.json', JSON.stringify(calls, null, 2));
26 global.fetch = originalFetch;
27}
28 
29record();
30 

Conclusion

TypeScript's genuine advantage for agentic AI is not ecosystem size — Python still leads there — but compile-time safety for workflow state and tool contracts. Zod schemas at LLM output boundaries give you runtime validation that TypeScript's erased types cannot provide. Discriminated unions for agent actions make illegal states unrepresentable. And the async-first nature of Node.js aligns naturally with the I/O-bound reality of LLM orchestration.

The practical path forward: define your workflow state as an immutable typed interface, validate every LLM output with Zod before it touches application logic, separate tool definitions from implementations so both are independently testable, and run long workflows through BullMQ rather than synchronous request-response cycles. Use LangGraph.js when you need stateful multi-step graphs with checkpointing; use the Vercel AI SDK when you need streaming responses in a Next.js context. The TypeScript agentic ecosystem is younger than Python's, but the type system advantage compounds — every bug caught at compile time is a production incident you never have to diagnose.

FAQ

Need expert help?

Building with agentic AI?

I help teams ship production-grade systems. From architecture review to hands-on builds.

Muneer Puthiya Purayil

SaaS Architect & AI Systems Engineer. 10+ years shipping production infrastructure across fintech, automotive, e-commerce, and healthcare.

Engage

Start a
Conversation.

For teams building at scale: SaaS platforms, agentic AI systems, and enterprise mobile infrastructure. Scope and fit are evaluated before any engagement begins.

Limited availability · Q3 / Q4 2026