Back to Journal
AI Architecture

Complete Guide to RAG Pipeline Design with Typescript

A comprehensive guide to implementing RAG Pipeline Design using Typescript, covering architecture, code examples, and production-ready patterns.

Muneer Puthiya Purayil 20 min read

Building a RAG pipeline in TypeScript gives you type safety across the entire retrieval-augmented generation stack — from document ingestion through vector search to LLM response generation. This guide covers implementing a production-ready RAG system using TypeScript, with NestJS for the API layer and popular vector database clients.

Architecture Overview

typescript
1interface RAGConfig {
2 embeddingModel: string;
3 embeddingDimensions: number;
4 chunkSize: number;
5 chunkOverlap: number;
6 retrievalTopK: number;
7 generationModel: string;
8 maxGenerationTokens: number;
9}
10 
11const defaultConfig: RAGConfig = {
12 embeddingModel: 'text-embedding-3-small',
13 embeddingDimensions: 1536,
14 chunkSize: 512,
15 chunkOverlap: 50,
16 retrievalTopK: 5,
17 generationModel: 'claude-sonnet-4-5-20250514',
18 maxGenerationTokens: 1024,
19};
20 

Document Parsing

Handle multiple document formats with a type-safe parser interface:

typescript
1interface ParsedDocument {
2 content: string;
3 metadata: {
4 source: string;
5 title: string;
6 format: string;
7 wordCount: number;
8 };
9}
10 
11interface DocumentParser {
12 parse(filePath: string): Promise<ParsedDocument>;
13 supports(extension: string): boolean;
14}
15 
16class MarkdownParser implements DocumentParser {
17 supports(ext: string): boolean {
18 return ['.md', '.mdx'].includes(ext);
19 }
20 
21 async parse(filePath: string): Promise<ParsedDocument> {
22 const content = await fs.readFile(filePath, 'utf-8');
23 const title = this.extractTitle(content);
24 return {
25 content,
26 metadata: {
27 source: path.basename(filePath),
28 title,
29 format: 'markdown',
30 wordCount: content.split(/\s+/).length,
31 },
32 };
33 }
34 
35 private extractTitle(content: string): string {
36 const match = content.match(/^#\s+(.+)$/m);
37 return match?.[1] ?? 'Untitled';
38 }
39}
40 
41class PDFParser implements DocumentParser {
42 supports(ext: string): boolean {
43 return ext === '.pdf';
44 }
45 
46 async parse(filePath: string): Promise<ParsedDocument> {
47 const pdfParse = await import('pdf-parse');
48 const buffer = await fs.readFile(filePath);
49 const data = await pdfParse.default(buffer);
50 return {
51 content: data.text,
52 metadata: {
53 source: path.basename(filePath),
54 title: data.info?.Title ?? path.basename(filePath, '.pdf'),
55 format: 'pdf',
56 wordCount: data.text.split(/\s+/).length,
57 },
58 };
59 }
60}
61 
62class ParserRegistry {
63 private parsers: DocumentParser[] = [
64 new MarkdownParser(),
65 new PDFParser(),
66 ];
67 
68 getParser(filePath: string): DocumentParser {
69 const ext = path.extname(filePath).toLowerCase();
70 const parser = this.parsers.find(p => p.supports(ext));
71 if (!parser) throw new Error(`No parser for format: ${ext}`);
72 return parser;
73 }
74}
75 

Type-Safe Chunking

typescript
1interface Chunk {
2 id: string;
3 text: string;
4 metadata: ChunkMetadata;
5}
6 
7interface ChunkMetadata {
8 documentId: string;
9 source: string;
10 section?: string;
11 chunkIndex: number;
12 totalChunks: number;
13}
14 
15class RecursiveChunker {
16 private separators = ['\n\n', '\n', '. ', ' '];
17 
18 constructor(
19 private maxTokens: number = 512,
20 private overlap: number = 50,
21 ) {}
22 
23 chunk(text: string, metadata: Omit<ChunkMetadata, 'chunkIndex' | 'totalChunks'>): Chunk[] {
24 const rawChunks = this.split(text, this.separators);
25 const total = rawChunks.length;
26 
27 return rawChunks.map((text, index) => ({
28 id: crypto.randomUUID(),
29 text: text.trim(),
30 metadata: { ...metadata, chunkIndex: index, totalChunks: total },
31 })).filter(c => c.text.length > 0);
32 }
33 
34 private split(text: string, separators: string[]): string[] {
35 if (separators.length === 0) return [text];
36 
37 const [sep, ...remaining] = separators;
38 const parts = text.split(sep);
39 const chunks: string[] = [];
40 let current = '';
41 
42 for (const part of parts) {
43 const candidate = current ? current + sep + part : part;
44 
45 if (this.tokenCount(candidate) > this.maxTokens) {
46 if (current) chunks.push(current);
47 if (this.tokenCount(part) > this.maxTokens) {
48 chunks.push(...this.split(part, remaining));
49 current = '';
50 } else {
51 current = part;
52 }
53 } else {
54 current = candidate;
55 }
56 }
57 
58 if (current) chunks.push(current);
59 return chunks;
60 }
61 
62 private tokenCount(text: string): number {
63 return Math.ceil(text.split(/\s+/).length * 1.33);
64 }
65}
66 
67class SectionAwareChunker {
68 private recursive: RecursiveChunker;
69 
70 constructor(maxTokens: number = 512) {
71 this.recursive = new RecursiveChunker(maxTokens);
72 }
73 
74 chunk(text: string, metadata: Omit<ChunkMetadata, 'chunkIndex' | 'totalChunks' | 'section'>): Chunk[] {
75 const sections = this.extractSections(text);
76 const allChunks: Chunk[] = [];
77 
78 for (const section of sections) {
79 const prefix = `Section: ${section.header}\n\n`;
80 const sectionChunks = this.recursive.chunk(section.content, {
81 ...metadata,
82 section: section.header,
83 });
84 
85 for (const chunk of sectionChunks) {
86 chunk.text = prefix + chunk.text;
87 allChunks.push(chunk);
88 }
89 }
90 
91 return allChunks;
92 }
93 
94 private extractSections(text: string): Array<{ header: string; content: string }> {
95 const sections: Array<{ header: string; content: string }> = [];
96 let currentHeader = 'Introduction';
97 let currentContent = '';
98 
99 for (const line of text.split('\n')) {
100 const match = line.match(/^#{1,3}\s+(.+)$/);
101 if (match) {
102 if (currentContent.trim()) {
103 sections.push({ header: currentHeader, content: currentContent.trim() });
104 }
105 currentHeader = match[1];
106 currentContent = '';
107 } else {
108 currentContent += line + '\n';
109 }
110 }
111 
112 if (currentContent.trim()) {
113 sections.push({ header: currentHeader, content: currentContent.trim() });
114 }
115 
116 return sections;
117 }
118}
119 

Embedding Service

typescript
1import OpenAI from 'openai';
2 
3class EmbeddingService {
4 private client: OpenAI;
5 
6 constructor(private model: string = 'text-embedding-3-small') {
7 this.client = new OpenAI();
8 }
9 
10 async embed(texts: string[], batchSize: number = 64): Promise<number[][]> {
11 const embeddings: number[][] = [];
12 
13 for (let i = 0; i < texts.length; i += batchSize) {
14 const batch = texts.slice(i, i + batchSize);
15 const response = await this.client.embeddings.create({
16 model: this.model,
17 input: batch,
18 });
19 embeddings.push(...response.data.map(d => d.embedding));
20 }
21 
22 return embeddings;
23 }
24 
25 async embedQuery(query: string): Promise<number[]> {
26 const [embedding] = await this.embed([query]);
27 return embedding;
28 }
29}
30 

Vector Store Integration

Qdrant Client

typescript
1import { QdrantClient } from '@qdrant/js-client-rest';
2 
3interface SearchResult {
4 id: string;
5 text: string;
6 score: number;
7 metadata: Record<string, unknown>;
8}
9 
10class QdrantVectorStore {
11 private client: QdrantClient;
12 
13 constructor(
14 private collection: string,
15 private dimensions: number,
16 url: string,
17 apiKey: string,
18 ) {
19 this.client = new QdrantClient({ url, apiKey });
20 }
21 
22 async ensureCollection(): Promise<void> {
23 const collections = await this.client.getCollections();
24 const exists = collections.collections.some(c => c.name === this.collection);
25 
26 if (!exists) {
27 await this.client.createCollection(this.collection, {
28 vectors: { size: this.dimensions, distance: 'Cosine' },
29 });
30 }
31 }
32 
33 async upsert(chunks: Chunk[], embeddings: number[][]): Promise<void> {
34 const points = chunks.map((chunk, i) => ({
35 id: chunk.id,
36 vector: embeddings[i],
37 payload: { text: chunk.text, ...chunk.metadata },
38 }));
39 
40 await this.client.upsert(this.collection, { points });
41 }
42 
43 async search(
44 queryEmbedding: number[],
45 topK: number = 5,
46 filter?: Record<string, unknown>,
47 ): Promise<SearchResult[]> {
48 const results = await this.client.search(this.collection, {
49 vector: queryEmbedding,
50 limit: topK,
51 filter: filter ? { must: Object.entries(filter).map(([key, value]) => ({
52 key,
53 match: { value },
54 })) } : undefined,
55 });
56 
57 return results.map(r => ({
58 id: String(r.id),
59 text: String(r.payload?.text ?? ''),
60 score: r.score,
61 metadata: Object.fromEntries(
62 Object.entries(r.payload ?? {}).filter(([k]) => k !== 'text')
63 ),
64 }));
65 }
66}
67 

Need a second opinion on your AI systems architecture?

I run free 30-minute strategy calls for engineering teams tackling this exact problem.

Book a Free Call

RAG Query Pipeline

typescript
1import Anthropic from '@anthropic-ai/sdk';
2 
3interface RAGResponse {
4 answer: string;
5 sources: Array<{ text: string; score: number; source: string }>;
6 model: string;
7 tokens: number;
8}
9 
10class RAGPipeline {
11 private anthropic: Anthropic;
12 
13 constructor(
14 private config: RAGConfig,
15 private embedder: EmbeddingService,
16 private vectorStore: QdrantVectorStore,
17 ) {
18 this.anthropic = new Anthropic();
19 }
20 
21 async query(question: string, filters?: Record<string, unknown>): Promise<RAGResponse> {
22 const queryEmbedding = await this.embedder.embedQuery(question);
23 
24 const results = await this.vectorStore.search(
25 queryEmbedding,
26 this.config.retrievalTopK,
27 filters,
28 );
29 
30 if (results.length === 0) {
31 return {
32 answer: 'I could not find relevant information to answer this question.',
33 sources: [],
34 model: this.config.generationModel,
35 tokens: 0,
36 };
37 }
38 
39 const context = this.buildContext(results);
40 
41 const response = await this.anthropic.messages.create({
42 model: this.config.generationModel,
43 max_tokens: this.config.maxGenerationTokens,
44 system: `You are a helpful assistant. Answer the user's question using only the provided context. If the context does not contain enough information, say so. Cite your sources by referencing the document title or section.`,
45 messages: [
46 { role: 'user', content: `Context:\n${context}\n\nQuestion: ${question}` },
47 ],
48 });
49 
50 const answerText = response.content[0].type === 'text' ? response.content[0].text : '';
51 
52 return {
53 answer: answerText,
54 sources: results.map(r => ({
55 text: r.text.slice(0, 200),
56 score: r.score,
57 source: String(r.metadata.source ?? 'Unknown'),
58 })),
59 model: this.config.generationModel,
60 tokens: response.usage.input_tokens + response.usage.output_tokens,
61 };
62 }
63 
64 private buildContext(results: SearchResult[]): string {
65 return results
66 .map((r, i) => {
67 const source = r.metadata.source ?? 'Unknown';
68 const section = r.metadata.section ?? '';
69 let header = `[Source ${i + 1}: ${source}`;
70 if (section) header += ` - ${section}`;
71 header += ']';
72 return `${header}\n${r.text}`;
73 })
74 .join('\n\n---\n\n');
75 }
76}
77 

Streaming Responses

typescript
1async function* streamQuery(
2 pipeline: RAGPipeline,
3 question: string,
4): AsyncGenerator<string> {
5 const queryEmbedding = await pipeline['embedder'].embedQuery(question);
6 const results = await pipeline['vectorStore'].search(
7 queryEmbedding,
8 pipeline['config'].retrievalTopK,
9 );
10 
11 const context = pipeline['buildContext'](results);
12 
13 const stream = await pipeline['anthropic'].messages.stream({
14 model: pipeline['config'].generationModel,
15 max_tokens: pipeline['config'].maxGenerationTokens,
16 system: 'Answer using only the provided context. Cite your sources.',
17 messages: [
18 { role: 'user', content: `Context:\n${context}\n\nQuestion: ${question}` },
19 ],
20 });
21 
22 for await (const event of stream) {
23 if (event.type === 'content_block_delta' && event.delta.type === 'text_delta') {
24 yield event.delta.text;
25 }
26 }
27}
28 

NestJS API Integration

typescript
1import { Controller, Post, Body, Sse } from '@nestjs/common';
2import { Observable, from } from 'rxjs';
3import { map } from 'rxjs/operators';
4 
5class QueryDto {
6 question: string;
7 filters?: Record<string, unknown>;
8}
9 
10@Controller('api/rag')
11class RAGController {
12 constructor(private ragPipeline: RAGPipeline) {}
13 
14 @Post('query')
15 async query(@Body() dto: QueryDto): Promise<RAGResponse> {
16 return this.ragPipeline.query(dto.question, dto.filters);
17 }
18 
19 @Sse('stream')
20 stream(@Body() dto: QueryDto): Observable<MessageEvent> {
21 const generator = streamQuery(this.ragPipeline, dto.question);
22 return from(generator).pipe(
23 map(chunk => ({ data: JSON.stringify({ text: chunk }) }) as MessageEvent),
24 );
25 }
26}
27 

Ingestion Service

typescript
1class IngestionService {
2 constructor(
3 private parserRegistry: ParserRegistry,
4 private chunker: SectionAwareChunker,
5 private embedder: EmbeddingService,
6 private vectorStore: QdrantVectorStore,
7 ) {}
8 
9 async ingestFile(filePath: string): Promise<{ chunks: number }> {
10 const parser = this.parserRegistry.getParser(filePath);
11 const doc = await parser.parse(filePath);
12 
13 const chunks = this.chunker.chunk(doc.content, {
14 documentId: crypto.randomUUID(),
15 source: doc.metadata.source,
16 });
17 
18 const embeddings = await this.embedder.embed(chunks.map(c => c.text));
19 await this.vectorStore.upsert(chunks, embeddings);
20 
21 return { chunks: chunks.length };
22 }
23 
24 async ingestDirectory(dirPath: string): Promise<{ files: number; chunks: number; errors: string[] }> {
25 const files = await this.getFiles(dirPath);
26 let totalChunks = 0;
27 const errors: string[] = [];
28 
29 for (const file of files) {
30 try {
31 const result = await this.ingestFile(file);
32 totalChunks += result.chunks;
33 } catch (error) {
34 errors.push(`${file}: ${(error as Error).message}`);
35 }
36 }
37 
38 return { files: files.length - errors.length, chunks: totalChunks, errors };
39 }
40 
41 private async getFiles(dirPath: string): Promise<string[]> {
42 const entries = await fs.readdir(dirPath, { withFileTypes: true, recursive: true });
43 return entries
44 .filter(e => e.isFile())
45 .map(e => path.join(e.parentPath ?? e.path, e.name));
46 }
47}
48 

Testing

typescript
1import { describe, it, expect, beforeAll } from 'vitest';
2 
3describe('RAG Pipeline', () => {
4 let pipeline: RAGPipeline;
5 
6 beforeAll(async () => {
7 const vectorStore = new QdrantVectorStore('test-collection', 1536, 'http://localhost:6333', '');
8 await vectorStore.ensureCollection();
9 
10 pipeline = new RAGPipeline(defaultConfig, new EmbeddingService(), vectorStore);
11 
12 // Ingest test documents
13 const ingestion = new IngestionService(
14 new ParserRegistry(),
15 new SectionAwareChunker(),
16 new EmbeddingService(),
17 vectorStore,
18 );
19 await ingestion.ingestFile('./test-fixtures/sample.md');
20 });
21 
22 it('retrieves relevant chunks for a query', async () => {
23 const result = await pipeline.query('What is the deployment process?');
24 expect(result.sources.length).toBeGreaterThan(0);
25 expect(result.answer).toBeTruthy();
26 expect(result.answer).not.toContain('could not find');
27 });
28 
29 it('returns empty result for unrelated queries', async () => {
30 const result = await pipeline.query('What is the recipe for chocolate cake?');
31 expect(result.sources.length).toBe(0);
32 });
33});
34 

Conclusion

TypeScript's type system provides meaningful safety guarantees across the RAG pipeline. Typed chunk metadata ensures document provenance is maintained through embedding and retrieval. Typed search results prevent accidentally accessing non-existent fields. The LLM client types catch prompt construction errors at compile time.

The architecture presented here is modular — each component (parser, chunker, embedder, vector store, generator) implements a clear interface and can be swapped independently. Start with the defaults and iterate based on retrieval quality metrics from your evaluation dataset.

FAQ

Need expert help?

Building with agentic AI?

I help teams ship production-grade systems. From architecture review to hands-on builds.

Muneer Puthiya Purayil

SaaS Architect & AI Systems Engineer. 10+ years shipping production infrastructure across fintech, automotive, e-commerce, and healthcare.

Engage

Start a
Conversation.

For teams building at scale: SaaS platforms, agentic AI systems, and enterprise mobile infrastructure. Scope and fit are evaluated before any engagement begins.

Limited availability · Q3 / Q4 2026