Should I use LangChain.js or build a custom TypeScript RAG pipeline?

Build custom for production. LangChain.js adds abstraction layers that make debugging difficult and increase bundle size. The pipeline in this guide is approximately 400 lines of TypeScript with full type safety and no magic. LangChain.js is useful for prototyping but introduces unnecessary complexity when you need to optimize retrieval quality or debug production issues.

How do I handle TypeScript type safety with different vector databases?

Define a common `VectorStore` interface with `upsert` and `search` methods, then implement it for each database (Qdrant, Pinecone, pgvector). This lets you swap vector databases without changing the pipeline code. The search result type should be consistent across implementations, with database-specific details abstracted behind the interface.

Can I use Drizzle or Prisma for pgvector instead of raw SQL?

Prisma does not natively support pgvector types. Drizzle has community extensions for pgvector. For production, use raw SQL via `pg` client for vector operations and Prisma/Drizzle for non-vector queries. The vector operations are simple (INSERT, SELECT with distance operator) and do not benefit from ORM abstraction.

How do I implement streaming RAG responses in a Next.js application?

Use Next.js Route Handlers with the Web Streams API. Create a `ReadableStream` that yields chunks from the Anthropic streaming response. On the client, use `fetch` with `response.body.getReader()` to consume the stream. This provides real-time token-by-token display identical to ChatGPT's streaming interface.

Complete Guide to RAG Pipeline Design with Typescript

Building a RAG pipeline in TypeScript gives you type safety across the entire retrieval-augmented generation stack — from document ingestion through vector search to LLM response generation. This guide covers implementing a production-ready RAG system using TypeScript, with NestJS for the API layer and popular vector database clients.

Architecture Overview

typescript

1interface RAGConfig {

2 embeddingModel: string;

3 embeddingDimensions: number;

4 chunkSize: number;

5 chunkOverlap: number;

6 retrievalTopK: number;

7 generationModel: string;

8 maxGenerationTokens: number;

11const defaultConfig: RAGConfig = {

12 embeddingModel: 'text-embedding-3-small',

13 embeddingDimensions: 1536,

14 chunkSize: 512,

15 chunkOverlap: 50,

16 retrievalTopK: 5,

17 generationModel: 'claude-sonnet-4-5-20250514',

18 maxGenerationTokens: 1024,

19};

Document Parsing

Handle multiple document formats with a type-safe parser interface:

typescript

1interface ParsedDocument {

2 content: string;

3 metadata: {

4 source: string;

5 title: string;

6 format: string;

7 wordCount: number;

8 };

11interface DocumentParser {

12 parse(filePath: string): Promise<ParsedDocument>;

13 supports(extension: string): boolean;

14}

16class MarkdownParser implements DocumentParser {

17 supports(ext: string): boolean {

18 return ['.md', '.mdx'].includes(ext);

19 }

21 async parse(filePath: string): Promise<ParsedDocument> {

22 const content = await fs.readFile(filePath, 'utf-8');

23 const title = this.extractTitle(content);

24 return {

25 content,

26 metadata: {

27 source: path.basename(filePath),

28 title,

29 format: 'markdown',

30 wordCount: content.split(/\s+/).length,

31 },

32 };

33 }

35 private extractTitle(content: string): string {

36 const match = content.match(/^#\s+(.+)$/m);

37 return match?.[1] ?? 'Untitled';

38 }

39}

41class PDFParser implements DocumentParser {

42 supports(ext: string): boolean {

43 return ext === '.pdf';

44 }

46 async parse(filePath: string): Promise<ParsedDocument> {

47 const pdfParse = await import('pdf-parse');

48 const buffer = await fs.readFile(filePath);

49 const data = await pdfParse.default(buffer);

50 return {

51 content: data.text,

52 metadata: {

53 source: path.basename(filePath),

54 title: data.info?.Title ?? path.basename(filePath, '.pdf'),

55 format: 'pdf',

56 wordCount: data.text.split(/\s+/).length,

57 },

58 };

59 }

60}

62class ParserRegistry {

63 private parsers: DocumentParser[] = [

64 new MarkdownParser(),

65 new PDFParser(),

66 ];

68 getParser(filePath: string): DocumentParser {

69 const ext = path.extname(filePath).toLowerCase();

70 const parser = this.parsers.find(p => p.supports(ext));

71 if (!parser) throw new Error(`No parser for format: ${ext}`);

72 return parser;

73 }

74}

Type-Safe Chunking

typescript

1interface Chunk {

2 id: string;

3 text: string;

4 metadata: ChunkMetadata;

7interface ChunkMetadata {

8 documentId: string;

9 source: string;

10 section?: string;

11 chunkIndex: number;

12 totalChunks: number;

13}

15class RecursiveChunker {

16 private separators = ['\n\n', '\n', '. ', ' '];

18 constructor(

19 private maxTokens: number = 512,

20 private overlap: number = 50,

21 ) {}

23 chunk(text: string, metadata: Omit<ChunkMetadata, 'chunkIndex' | 'totalChunks'>): Chunk[] {

24 const rawChunks = this.split(text, this.separators);

25 const total = rawChunks.length;

27 return rawChunks.map((text, index) => ({

28 id: crypto.randomUUID(),

29 text: text.trim(),

30 metadata: { ...metadata, chunkIndex: index, totalChunks: total },

31 })).filter(c => c.text.length > 0);

32 }

34 private split(text: string, separators: string[]): string[] {

35 if (separators.length === 0) return [text];

37 const [sep, ...remaining] = separators;

38 const parts = text.split(sep);

39 const chunks: string[] = [];

40 let current = '';

42 for (const part of parts) {

43 const candidate = current ? current + sep + part : part;

45 if (this.tokenCount(candidate) > this.maxTokens) {

46 if (current) chunks.push(current);

47 if (this.tokenCount(part) > this.maxTokens) {

48 chunks.push(...this.split(part, remaining));

49 current = '';

50 } else {

51 current = part;

52 }

53 } else {

54 current = candidate;

55 }

56 }

58 if (current) chunks.push(current);

59 return chunks;

60 }

62 private tokenCount(text: string): number {

63 return Math.ceil(text.split(/\s+/).length * 1.33);

64 }

65}

67class SectionAwareChunker {

68 private recursive: RecursiveChunker;

70 constructor(maxTokens: number = 512) {

71 this.recursive = new RecursiveChunker(maxTokens);

72 }

74 chunk(text: string, metadata: Omit<ChunkMetadata, 'chunkIndex' | 'totalChunks' | 'section'>): Chunk[] {

75 const sections = this.extractSections(text);

76 const allChunks: Chunk[] = [];

78 for (const section of sections) {

79 const prefix = `Section: ${section.header}\n\n`;

80 const sectionChunks = this.recursive.chunk(section.content, {

81 ...metadata,

82 section: section.header,

83 });

85 for (const chunk of sectionChunks) {

86 chunk.text = prefix + chunk.text;

87 allChunks.push(chunk);

88 }

89 }

91 return allChunks;

92 }

94 private extractSections(text: string): Array<{ header: string; content: string }> {

95 const sections: Array<{ header: string; content: string }> = [];

96 let currentHeader = 'Introduction';

97 let currentContent = '';

99 for (const line of text.split('\n')) {

100 const match = line.match(/^#{1,3}\s+(.+)$/);

101 if (match) {

102 if (currentContent.trim()) {

103 sections.push({ header: currentHeader, content: currentContent.trim() });

104 }

105 currentHeader = match[1];

106 currentContent = '';

107 } else {

108 currentContent += line + '\n';

109 }

110 }

111

112 if (currentContent.trim()) {

113 sections.push({ header: currentHeader, content: currentContent.trim() });

114 }

115

116 return sections;

117 }

118}

119

Embedding Service

typescript

1import OpenAI from 'openai';

3class EmbeddingService {

4 private client: OpenAI;

6 constructor(private model: string = 'text-embedding-3-small') {

7 this.client = new OpenAI();

8 }

10 async embed(texts: string[], batchSize: number = 64): Promise<number[][]> {

11 const embeddings: number[][] = [];

13 for (let i = 0; i < texts.length; i += batchSize) {

14 const batch = texts.slice(i, i + batchSize);

15 const response = await this.client.embeddings.create({

16 model: this.model,

17 input: batch,

18 });

19 embeddings.push(...response.data.map(d => d.embedding));

20 }

22 return embeddings;

23 }

25 async embedQuery(query: string): Promise<number[]> {

26 const [embedding] = await this.embed([query]);

27 return embedding;

28 }

29}

Vector Store Integration

Qdrant Client

typescript

1import { QdrantClient } from '@qdrant/js-client-rest';

3interface SearchResult {

4 id: string;

5 text: string;

6 score: number;

7 metadata: Record<string, unknown>;

10class QdrantVectorStore {

11 private client: QdrantClient;

13 constructor(

14 private collection: string,

15 private dimensions: number,

16 url: string,

17 apiKey: string,

18 ) {

19 this.client = new QdrantClient({ url, apiKey });

20 }

22 async ensureCollection(): Promise<void> {

23 const collections = await this.client.getCollections();

24 const exists = collections.collections.some(c => c.name === this.collection);

26 if (!exists) {

27 await this.client.createCollection(this.collection, {

28 vectors: { size: this.dimensions, distance: 'Cosine' },

29 });

30 }

31 }

33 async upsert(chunks: Chunk[], embeddings: number[][]): Promise<void> {

34 const points = chunks.map((chunk, i) => ({

35 id: chunk.id,

36 vector: embeddings[i],

37 payload: { text: chunk.text, ...chunk.metadata },

38 }));

40 await this.client.upsert(this.collection, { points });

41 }

43 async search(

44 queryEmbedding: number[],

45 topK: number = 5,

46 filter?: Record<string, unknown>,

47 ): Promise<SearchResult[]> {

48 const results = await this.client.search(this.collection, {

49 vector: queryEmbedding,

50 limit: topK,

51 filter: filter ? { must: Object.entries(filter).map(([key, value]) => ({

52 key,

53 match: { value },

54 })) } : undefined,

55 });

57 return results.map(r => ({

58 id: String(r.id),

59 text: String(r.payload?.text ?? ''),

60 score: r.score,

61 metadata: Object.fromEntries(

62 Object.entries(r.payload ?? {}).filter(([k]) => k !== 'text')

63 ),

64 }));

65 }

66}

Need a second opinion on your AI systems architecture?

I run free 30-minute strategy calls for engineering teams tackling this exact problem.

Book a Free Call

RAG Query Pipeline

typescript

1import Anthropic from '@anthropic-ai/sdk';

3interface RAGResponse {

4 answer: string;

5 sources: Array<{ text: string; score: number; source: string }>;

6 model: string;

7 tokens: number;

10class RAGPipeline {

11 private anthropic: Anthropic;

13 constructor(

14 private config: RAGConfig,

15 private embedder: EmbeddingService,

16 private vectorStore: QdrantVectorStore,

17 ) {

18 this.anthropic = new Anthropic();

19 }

21 async query(question: string, filters?: Record<string, unknown>): Promise<RAGResponse> {

22 const queryEmbedding = await this.embedder.embedQuery(question);

24 const results = await this.vectorStore.search(

25 queryEmbedding,

26 this.config.retrievalTopK,

27 filters,

28 );

30 if (results.length === 0) {

31 return {

32 answer: 'I could not find relevant information to answer this question.',

33 sources: [],

34 model: this.config.generationModel,

35 tokens: 0,

36 };

37 }

39 const context = this.buildContext(results);

41 const response = await this.anthropic.messages.create({

42 model: this.config.generationModel,

43 max_tokens: this.config.maxGenerationTokens,

44 system: `You are a helpful assistant. Answer the user's question using only the provided context. If the context does not contain enough information, say so. Cite your sources by referencing the document title or section.`,

45 messages: [

46 { role: 'user', content: `Context:\n${context}\n\nQuestion: ${question}` },

47 ],

48 });

50 const answerText = response.content[0].type === 'text' ? response.content[0].text : '';

52 return {

53 answer: answerText,

54 sources: results.map(r => ({

55 text: r.text.slice(0, 200),

56 score: r.score,

57 source: String(r.metadata.source ?? 'Unknown'),

58 })),

59 model: this.config.generationModel,

60 tokens: response.usage.input_tokens + response.usage.output_tokens,

61 };

62 }

64 private buildContext(results: SearchResult[]): string {

65 return results

66 .map((r, i) => {

67 const source = r.metadata.source ?? 'Unknown';

68 const section = r.metadata.section ?? '';

69 let header = `[Source ${i + 1}: ${source}`;

70 if (section) header += ` - ${section}`;

71 header += ']';

72 return `${header}\n${r.text}`;

73 })

74 .join('\n\n---\n\n');

75 }

76}

Streaming Responses

typescript

1async function* streamQuery(

2 pipeline: RAGPipeline,

3 question: string,

4): AsyncGenerator<string> {

5 const queryEmbedding = await pipeline['embedder'].embedQuery(question);

6 const results = await pipeline['vectorStore'].search(

7 queryEmbedding,

8 pipeline['config'].retrievalTopK,

9 );

11 const context = pipeline['buildContext'](results);

13 const stream = await pipeline['anthropic'].messages.stream({

14 model: pipeline['config'].generationModel,

15 max_tokens: pipeline['config'].maxGenerationTokens,

16 system: 'Answer using only the provided context. Cite your sources.',

17 messages: [

18 { role: 'user', content: `Context:\n${context}\n\nQuestion: ${question}` },

19 ],

20 });

22 for await (const event of stream) {

23 if (event.type === 'content_block_delta' && event.delta.type === 'text_delta') {

24 yield event.delta.text;

25 }

26 }

27}

NestJS API Integration

typescript

1import { Controller, Post, Body, Sse } from '@nestjs/common';

2import { Observable, from } from 'rxjs';

3import { map } from 'rxjs/operators';

5class QueryDto {

6 question: string;

7 filters?: Record<string, unknown>;

10@Controller('api/rag')

11class RAGController {

12 constructor(private ragPipeline: RAGPipeline) {}

14 @Post('query')

15 async query(@Body() dto: QueryDto): Promise<RAGResponse> {

16 return this.ragPipeline.query(dto.question, dto.filters);

17 }

19 @Sse('stream')

20 stream(@Body() dto: QueryDto): Observable<MessageEvent> {

21 const generator = streamQuery(this.ragPipeline, dto.question);

22 return from(generator).pipe(

23 map(chunk => ({ data: JSON.stringify({ text: chunk }) }) as MessageEvent),

24 );

25 }

26}

Ingestion Service

typescript

1class IngestionService {

2 constructor(

3 private parserRegistry: ParserRegistry,

4 private chunker: SectionAwareChunker,

5 private embedder: EmbeddingService,

6 private vectorStore: QdrantVectorStore,

7 ) {}

9 async ingestFile(filePath: string): Promise<{ chunks: number }> {

10 const parser = this.parserRegistry.getParser(filePath);

11 const doc = await parser.parse(filePath);

13 const chunks = this.chunker.chunk(doc.content, {

14 documentId: crypto.randomUUID(),

15 source: doc.metadata.source,

16 });

18 const embeddings = await this.embedder.embed(chunks.map(c => c.text));

19 await this.vectorStore.upsert(chunks, embeddings);

21 return { chunks: chunks.length };

22 }

24 async ingestDirectory(dirPath: string): Promise<{ files: number; chunks: number; errors: string[] }> {

25 const files = await this.getFiles(dirPath);

26 let totalChunks = 0;

27 const errors: string[] = [];

29 for (const file of files) {

30 try {

31 const result = await this.ingestFile(file);

32 totalChunks += result.chunks;

33 } catch (error) {

34 errors.push(`${file}: ${(error as Error).message}`);

35 }

36 }

38 return { files: files.length - errors.length, chunks: totalChunks, errors };

39 }

41 private async getFiles(dirPath: string): Promise<string[]> {

42 const entries = await fs.readdir(dirPath, { withFileTypes: true, recursive: true });

43 return entries

44 .filter(e => e.isFile())

45 .map(e => path.join(e.parentPath ?? e.path, e.name));

46 }

47}

Testing

typescript

1import { describe, it, expect, beforeAll } from 'vitest';

3describe('RAG Pipeline', () => {

4 let pipeline: RAGPipeline;

6 beforeAll(async () => {

7 const vectorStore = new QdrantVectorStore('test-collection', 1536, 'http://localhost:6333', '');

8 await vectorStore.ensureCollection();

10 pipeline = new RAGPipeline(defaultConfig, new EmbeddingService(), vectorStore);

12 // Ingest test documents

13 const ingestion = new IngestionService(

14 new ParserRegistry(),

15 new SectionAwareChunker(),

16 new EmbeddingService(),

17 vectorStore,

18 );

19 await ingestion.ingestFile('./test-fixtures/sample.md');

20 });

22 it('retrieves relevant chunks for a query', async () => {

23 const result = await pipeline.query('What is the deployment process?');

24 expect(result.sources.length).toBeGreaterThan(0);

25 expect(result.answer).toBeTruthy();

26 expect(result.answer).not.toContain('could not find');

27 });

29 it('returns empty result for unrelated queries', async () => {

30 const result = await pipeline.query('What is the recipe for chocolate cake?');

31 expect(result.sources.length).toBe(0);

32 });

33});

Conclusion

TypeScript's type system provides meaningful safety guarantees across the RAG pipeline. Typed chunk metadata ensures document provenance is maintained through embedding and retrieval. Typed search results prevent accidentally accessing non-existent fields. The LLM client types catch prompt construction errors at compile time.

The architecture presented here is modular — each component (parser, chunker, embedder, vector store, generator) implements a clear interface and can be swapped independently. Start with the defaults and iterate based on retrieval quality metrics from your evaluation dataset.

FAQ

Need expert help?

Building with agentic AI?

I help teams ship production-grade systems. From architecture review to hands-on builds.

Book a Free Call Send a Brief

rag vector-search embeddings llm typescript guide

Muneer Puthiya Purayil

SaaS Architect & AI Systems Engineer. 10+ years shipping production infrastructure across fintech, automotive, e-commerce, and healthcare.

View Portfolio Book a Call

← Previous

Complete Guide to RAG Pipeline Design with Typescript

Architecture Overview

Document Parsing

Type-Safe Chunking

Embedding Service

Vector Store Integration

Qdrant Client

RAG Query Pipeline

Streaming Responses

NestJS API Integration

Ingestion Service

Testing

Conclusion

FAQ

Building with agentic AI?

RAG Pipeline Design: Typescript vs Python in 2025

Complete Guide to RAG Pipeline Design with Python

RAG Pipeline Design Best Practices for High Scale Teams

Complete Guide to RAG Pipeline Design with Python

Agentic AI Workflows at Scale: Lessons from Production

Start a
Conversation.

Architecture Overview

Document Parsing

Type-Safe Chunking

Embedding Service

Vector Store Integration

Qdrant Client

RAG Query Pipeline

Streaming Responses

NestJS API Integration

Ingestion Service

Testing

Conclusion

FAQ

Building with agentic AI?

RAG Pipeline Design: Typescript vs Python in 2025

Complete Guide to RAG Pipeline Design with Python

RAG Pipeline Design Best Practices for High Scale Teams

Complete Guide to RAG Pipeline Design with Python

Agentic AI Workflows at Scale: Lessons from Production

Start aConversation.

Start a
Conversation.