Can you fine-tune LLMs entirely in TypeScript without Python?

For training, no — the ML ecosystem (PyTorch, Transformers, PEFT) is Python-only. However, for inference, ONNX Runtime has a Node.js binding (`onnxruntime-node`) that can run converted models. For most teams, the pragmatic approach is Python for training and TypeScript for everything else: data pipelines, serving infrastructure, and monitoring.

How do you handle communication between TypeScript services and Python training processes?

Three patterns work well: (1) spawn Python as a child process and parse stdout for metrics (simplest), (2) use a message queue (Redis, RabbitMQ) for decoupled communication, (3) expose a REST API from the Python training service. For startup teams, option 1 is sufficient. For larger teams with concurrent training jobs, option 2 provides better isolation and scalability.

What database should you use for the model registry?

PostgreSQL is the standard choice. It handles JSON columns for storing eval metrics and training configs, supports concurrent access from multiple services, and integrates well with TypeScript ORMs (Drizzle, Prisma). For the model artifacts themselves (adapter weights, merged models), use S3 or GCS and store the path in PostgreSQL.

How do you test the TypeScript fine-tuning infrastructure?

Unit test data validation and formatting logic with standard testing frameworks (Vitest, Jest). Integration test the training orchestrator with a mock Python script that outputs fake metrics. End-to-end test the evaluation service against a running vLLM instance with a small test model. The model registry tests should use a test PostgreSQL database with realistic data volumes.

Complete Guide to LLM Fine-Tuning Production with Typescript

While Python dominates the LLM training ecosystem, TypeScript provides a strong foundation for the production infrastructure surrounding fine-tuning: data pipelines, evaluation services, API gateways, and monitoring dashboards. This guide covers building the complete fine-tuning production system in TypeScript, interfacing with Python training processes where necessary.

Data Pipeline in TypeScript

Training Data Management Service

typescript

1import { createHash } from "node:crypto";

2import { readFile, writeFile } from "node:fs/promises";

4interface TrainingExample {

5 id: string;

6 instruction: string;

7 input: string;

8 output: string;

9 sourceSystem: string;

10 sourceDocumentId: string;

11 contentHash: string;

12 createdAt: string;

13 reviewStatus: "pending" | "approved" | "rejected";

14 annotatorId?: string;

15}

17interface ValidationResult {

18 valid: boolean;

19 totalExamples: number;

20 issues: Array<{ lineNumber: number; issue: string }>;

21}

23class TrainingDataPipeline {

24 async loadAndValidate(filePath: string): Promise<{

25 examples: TrainingExample[];

26 validation: ValidationResult;

27 }> {

28 const content = await readFile(filePath, "utf-8");

29 const lines = content.trim().split("\n");

30 const examples: TrainingExample[] = [];

31 const issues: ValidationResult["issues"] = [];

32 const seenHashes = new Set<string>();

34 for (let i = 0; i < lines.length; i++) {

35 const lineNum = i + 1;

36 try {

37 const item = JSON.parse(lines[i]);

39 if (!item.instruction || !item.output) {

40 issues.push({ lineNumber: lineNum, issue: "Missing required fields" });

41 continue;

42 }

44 const contentHash = createHash("sha256")

45 .update(`${item.instruction}|${item.output}`)

46 .digest("hex");

48 if (seenHashes.has(contentHash)) {

49 issues.push({ lineNumber: lineNum, issue: "Duplicate content" });

50 continue;

51 }

52 seenHashes.add(contentHash);

54 if (item.output.split(/\s+/).length < 10) {

55 issues.push({ lineNumber: lineNum, issue: "Output too short" });

56 continue;

57 }

59 examples.push({

60 id: `te_${contentHash.slice(0, 16)}`,

61 instruction: item.instruction,

62 input: item.input || "",

63 output: item.output,

64 sourceSystem: item.source_system || "manual",

65 sourceDocumentId: item.source_document_id || "",

66 contentHash,

67 createdAt: new Date().toISOString(),

68 reviewStatus: "pending",

69 annotatorId: item.annotator_id,

70 });

71 } catch {

72 issues.push({ lineNumber: lineNum, issue: "Invalid JSON" });

73 }

74 }

76 return {

77 examples,

78 validation: {

79 valid: issues.length === 0,

80 totalExamples: examples.length,

81 issues,

82 },

83 };

84 }

86 async formatForTraining(

87 examples: TrainingExample[],

88 outputPath: string,

89 format: "instruction" | "chat" = "instruction",

90 ): Promise<void> {

91 const formatted = examples

92 .filter((ex) => ex.reviewStatus === "approved")

93 .map((ex) => {

94 if (format === "chat") {

95 return JSON.stringify({

96 messages: [

97 { role: "system", content: "You are a helpful assistant." },

98 { role: "user", content: ex.input ? `${ex.instruction}\n\n${ex.input}` : ex.instruction },

99 { role: "assistant", content: ex.output },

100 ],

101 });

102 }

103 return JSON.stringify({

104 instruction: ex.instruction,

105 input: ex.input,

106 output: ex.output,

107 });

108 });

109

110 await writeFile(outputPath, formatted.join("\n"));

111 }

112}

113

Training Job Orchestration

TypeScript excels at orchestrating Python training processes and managing the lifecycle:

typescript

1import { spawn } from "node:child_process";

2import { EventEmitter } from "node:events";

4interface TrainingConfig {

5 baseModel: string;

6 datasetPath: string;

7 outputDir: string;

8 loraR: number;

9 loraAlpha: number;

10 learningRate: number;

11 numEpochs: number;

12 batchSize: number;

13 gradientAccumulation: number;

14 useQlora: boolean;

15}

17interface TrainingMetrics {

18 step: number;

19 loss: number;

20 learningRate: number;

21 epoch: number;

22 gpuMemoryUsed?: number;

23}

25class TrainingOrchestrator extends EventEmitter {

26 private activeJobs: Map<string, { process: ReturnType<typeof spawn>; config: TrainingConfig }> = new Map();

28 async startTraining(jobId: string, config: TrainingConfig): Promise<void> {

29 const args = [

30 "scripts/train.py",

31 "--base-model", config.baseModel,

32 "--dataset", config.datasetPath,

33 "--output-dir", config.outputDir,

34 "--lora-r", String(config.loraR),

35 "--lora-alpha", String(config.loraAlpha),

36 "--learning-rate", String(config.learningRate),

37 "--num-epochs", String(config.numEpochs),

38 "--batch-size", String(config.batchSize),

39 "--gradient-accumulation", String(config.gradientAccumulation),

40 ];

42 if (config.useQlora) {

43 args.push("--use-qlora");

44 }

46 const process = spawn("python", args, {

47 env: { ...process.env, PYTHONUNBUFFERED: "1" },

48 });

50 this.activeJobs.set(jobId, { process, config });

52 process.stdout.on("data", (data: Buffer) => {

53 const lines = data.toString().split("\n").filter(Boolean);

54 for (const line of lines) {

55 try {

56 const metrics: TrainingMetrics = JSON.parse(line);

57 this.emit("metrics", { jobId, metrics });

59 if (metrics.loss > 10) {

60 this.emit("alert", {

61 jobId,

62 type: "loss_spike",

63 message: `Loss spike detected: ${metrics.loss}`,

64 });

65 }

66 } catch {

67 this.emit("log", { jobId, message: line });

68 }

69 }

70 });

72 process.stderr.on("data", (data: Buffer) => {

73 this.emit("error", { jobId, message: data.toString() });

74 });

76 process.on("exit", (code) => {

77 this.activeJobs.delete(jobId);

78 this.emit("complete", { jobId, exitCode: code });

79 });

80 }

82 async stopTraining(jobId: string): Promise<void> {

83 const job = this.activeJobs.get(jobId);

84 if (job) {

85 job.process.kill("SIGTERM");

86 setTimeout(() => {

87 if (this.activeJobs.has(jobId)) {

88 job.process.kill("SIGKILL");

89 }

90 }, 30000);

91 }

92 }

93}

Evaluation Service

typescript

1import Fastify from "fastify";

3interface EvalRequest {

4 modelEndpoint: string;

5 evalDatasetPath: string;

6 maxExamples?: number;

9interface EvalResult {

10 accuracy: number;

11 formatCompliance: number;

12 avgSimilarity: number;

13 totalExamples: number;

14 perCategory: Record<string, number>;

15 failedExamples: Array<{

16 prompt: string;

17 expected: string;

18 generated: string;

19 category: string;

20 }>;

21}

23const app = Fastify({ logger: true });

25app.post<{ Body: EvalRequest }>("/evaluate", async (request): Promise<EvalResult> => {

26 const { modelEndpoint, evalDatasetPath, maxExamples = 100 } = request.body;

28 const evalData = await loadEvalDataset(evalDatasetPath, maxExamples);

30 let correct = 0;

31 let formatOk = 0;

32 const similarities: number[] = [];

33 const perCategory: Record<string, { correct: number; total: number }> = {};

34 const failedExamples: EvalResult["failedExamples"] = [];

36 for (const item of evalData) {

37 const generated = await callModel(modelEndpoint, item.prompt);

38 const category = item.category || "general";

40 if (!perCategory[category]) {

41 perCategory[category] = { correct: 0, total: 0 };

42 }

43 perCategory[category].total++;

45 const isCorrect = normalizeText(generated) === normalizeText(item.expected);

46 const isFormatOk = checkFormat(generated, item.formatSpec);

47 const similarity = computeSimilarity(generated, item.expected);

49 if (isCorrect) {

50 correct++;

51 perCategory[category].correct++;

52 } else {

53 failedExamples.push({

54 prompt: item.prompt.slice(0, 200),

55 expected: item.expected.slice(0, 200),

56 generated: generated.slice(0, 200),

57 category,

58 });

59 }

60 if (isFormatOk) formatOk++;

61 similarities.push(similarity);

62 }

64 const total = evalData.length;

65 return {

66 accuracy: correct / total,

67 formatCompliance: formatOk / total,

68 avgSimilarity: similarities.reduce((a, b) => a + b, 0) / similarities.length,

69 totalExamples: total,

70 perCategory: Object.fromEntries(

71 Object.entries(perCategory).map(([k, v]) => [k, v.correct / v.total])

72 ),

73 failedExamples: failedExamples.slice(0, 20),

74 };

75});

77async function callModel(endpoint: string, prompt: string): Promise<string> {

78 const response = await fetch(`${endpoint}/v1/chat/completions`, {

79 method: "POST",

80 headers: { "Content-Type": "application/json" },

81 body: JSON.stringify({

82 model: "default",

83 messages: [{ role: "user", content: prompt }],

84 temperature: 0.1,

85 max_tokens: 512,

86 }),

87 });

89 const data = await response.json();

90 return data.choices[0].message.content;

91}

93function normalizeText(text: string): string {

94 return text.trim().toLowerCase().replace(/\s+/g, " ");

95}

97function computeSimilarity(a: string, b: string): number {

98 const na = normalizeText(a);

99 const nb = normalizeText(b);

100 const longer = na.length > nb.length ? na : nb;

101 const shorter = na.length > nb.length ? nb : na;

102 if (longer.length === 0) return 1.0;

103

104 let matches = 0;

105 const words_a = na.split(" ");

106 const words_b = new Set(nb.split(" "));

107 for (const word of words_a) {

108 if (words_b.has(word)) matches++;

109 }

110 return matches / Math.max(words_a.length, words_b.size);

111}

112

113function checkFormat(output: string, formatSpec?: { type: string }): boolean {

114 if (!formatSpec) return true;

115 if (formatSpec.type === "json") {

116 try { JSON.parse(output); return true; } catch { return false; }

117 }

118 return true;

119}

120

Need a second opinion on your AI systems architecture?

I run free 30-minute strategy calls for engineering teams tackling this exact problem.

Book a Free Call

Model Registry API

typescript

1interface ModelVersion {

2 id: string;

3 modelName: string;

4 version: number;

5 stage: "development" | "staging" | "production";

6 adapterPath: string;

7 mergedModelPath?: string;

8 evalMetrics: EvalResult;

9 trainingConfig: TrainingConfig;

10 createdAt: string;

11 promotedBy?: string;

12 promotedAt?: string;

13}

15class ModelRegistry {

16 private db: Pool;

18 constructor(connectionString: string) {

19 this.db = new Pool({ connectionString });

20 }

22 async registerModel(

23 modelName: string,

24 adapterPath: string,

25 evalMetrics: EvalResult,

26 trainingConfig: TrainingConfig,

27 ): Promise<ModelVersion> {

28 const latestVersion = await this.db.query(

29 "SELECT MAX(version) as max_version FROM model_versions WHERE model_name = $1",

30 [modelName]

31 );

33 const version = (latestVersion.rows[0]?.max_version || 0) + 1;

35 if (evalMetrics.accuracy < 0.85) {

36 throw new Error(

37 `Model accuracy ${evalMetrics.accuracy} below minimum threshold 0.85`

38 );

39 }

41 const result = await this.db.query(

42 `INSERT INTO model_versions (model_name, version, stage, adapter_path, eval_metrics, training_config)

43 VALUES ($1, $2, 'development', $3, $4, $5) RETURNING *`,

44 [modelName, version, adapterPath, JSON.stringify(evalMetrics), JSON.stringify(trainingConfig)]

45 );

47 return result.rows[0];

48 }

50 async promoteModel(

51 modelName: string,

52 version: number,

53 targetStage: "staging" | "production",

54 approvedBy: string,

55 ): Promise<ModelVersion> {

56 const model = await this.getModelVersion(modelName, version);

58 if (targetStage === "production" && model.stage !== "staging") {

59 throw new Error("Models must pass through staging before production");

60 }

62 if (targetStage === "production" && model.evalMetrics.accuracy < 0.90) {

63 throw new Error(

64 `Production requires 90%+ accuracy. Current: ${model.evalMetrics.accuracy}`

65 );

66 }

68 const result = await this.db.query(

69 `UPDATE model_versions SET stage = $1, promoted_by = $2, promoted_at = NOW()

70 WHERE model_name = $3 AND version = $4 RETURNING *`,

71 [targetStage, approvedBy, modelName, version]

72 );

74 return result.rows[0];

75 }

77 async getModelVersion(modelName: string, version: number): Promise<ModelVersion> {

78 const result = await this.db.query(

79 "SELECT * FROM model_versions WHERE model_name = $1 AND version = $2",

80 [modelName, version]

81 );

82 return result.rows[0];

83 }

85 async getProductionModel(modelName: string): Promise<ModelVersion | null> {

86 const result = await this.db.query(

87 "SELECT * FROM model_versions WHERE model_name = $1 AND stage = 'production' ORDER BY version DESC LIMIT 1",

88 [modelName]

89 );

90 return result.rows[0] || null;

91 }

92}

Monitoring Dashboard Backend

typescript

1import { WebSocketServer } from "ws";

3interface TrainingMetricsEvent {

4 jobId: string;

5 timestamp: string;

6 step: number;

7 loss: number;

8 evalLoss?: number;

9 learningRate: number;

10 gpuMemoryPercent: number;

11 throughputTokensPerSec: number;

12}

14class MonitoringService {

15 private wss: WebSocketServer;

16 private metricsBuffer: TrainingMetricsEvent[] = [];

18 constructor(port: number) {

19 this.wss = new WebSocketServer({ port });

21 this.wss.on("connection", (ws) => {

22 ws.send(JSON.stringify({

23 type: "history",

24 data: this.metricsBuffer.slice(-1000),

25 }));

26 });

27 }

29 pushMetrics(event: TrainingMetricsEvent): void {

30 this.metricsBuffer.push(event);

32 if (this.metricsBuffer.length > 10000) {

33 this.metricsBuffer = this.metricsBuffer.slice(-5000);

34 }

36 const message = JSON.stringify({ type: "metrics", data: event });

37 for (const client of this.wss.clients) {

38 if (client.readyState === 1) {

39 client.send(message);

40 }

41 }

42 }

43}

Conclusion

TypeScript's role in LLM fine-tuning production is not replacing Python for training — it's building the production infrastructure around it. Data pipelines, evaluation services, model registries, and monitoring dashboards are natural TypeScript territory. The result is a system where Python handles the math and TypeScript handles the operations, leveraging each language's strengths.

This architecture works particularly well for teams that are already TypeScript-centric. Rather than requiring every team member to learn Python for operational tasks, the Python surface area is contained to training scripts while TypeScript handles everything the broader engineering team interacts with.

FAQ

Need expert help?

Building with agentic AI?

I help teams ship production-grade systems. From architecture review to hands-on builds.

Book a Free Call Send a Brief

llm fine-tuning mlops training typescript guide

Muneer Puthiya Purayil

SaaS Architect & AI Systems Engineer. 10+ years shipping production infrastructure across fintech, automotive, e-commerce, and healthcare.

View Portfolio Book a Call

← Previous

Complete Guide to LLM Fine-Tuning Production with Typescript

Data Pipeline in TypeScript

Training Data Management Service

Training Job Orchestration

Evaluation Service

Model Registry API

Monitoring Dashboard Backend

Conclusion

FAQ

Building with agentic AI?

Complete Guide to LLM Fine-Tuning Production with Python

LLM Fine-Tuning Production at Scale: Lessons from Production

LLM Fine-Tuning Production Best Practices for High Scale Teams

How to Build LLM Fine-Tuning Production Using Fastapi

Complete Guide to LLM Fine-Tuning Production with Python

Start a
Conversation.

Data Pipeline in TypeScript

Training Data Management Service

Training Job Orchestration

Evaluation Service

Model Registry API

Monitoring Dashboard Backend

Conclusion

FAQ

Building with agentic AI?

Complete Guide to LLM Fine-Tuning Production with Python

LLM Fine-Tuning Production at Scale: Lessons from Production

LLM Fine-Tuning Production Best Practices for High Scale Teams

How to Build LLM Fine-Tuning Production Using Fastapi

Complete Guide to LLM Fine-Tuning Production with Python

Start aConversation.

Start a
Conversation.