When should a startup move from platform-managed to Kubernetes?

Most startups never need Kubernetes. Move when you hit at least three of these: 10+ services, complex networking requirements (service mesh), need for custom autoscaling policies, or team size exceeds 15 engineers. Until then, Railway, Fly.io, or ECS Fargate handle zero-downtime deployments without Kubernetes complexity.

How do I handle zero-downtime deploys for WebSocket connections?

WebSocket connections break during pod restarts regardless of your strategy. Handle reconnection on the client side — most WebSocket libraries support automatic reconnection with exponential backoff. On the server, implement session resumption so reconnected clients restore their state. Keep the reconnection window under 5 seconds and users won't notice.

Should startups use feature flags from day one?

Environment variable-based feature toggles (free, simple) are sufficient for early-stage startups. Adopt a dedicated feature flag service (LaunchDarkly, PostHog) when you need percentage-based rollouts, user targeting, or real-time toggling without restarts. This typically happens around 10-15 engineers.

What's the minimum setup for zero-downtime deploys?

Two instances behind a load balancer with health checks. That's it. When you deploy, the load balancer sends traffic only to healthy instances. If the new version fails health checks, traffic stays on the old version. This costs about $20/month on most platforms and takes 30 minutes to set up.

Zero-Downtime Deployments Best Practices for Startup Teams

Startups can't afford downtime during deployments, but they also can't afford spending two weeks building an enterprise deployment pipeline. You need something that works reliably with minimal infrastructure, scales with you, and doesn't require a dedicated DevOps engineer to maintain. This guide covers the pragmatic deployment patterns that get startups to zero-downtime without over-engineering.

Start Simple: Platform-Managed Deployments

Before building any deployment infrastructure, use what your platform provides:

Vercel / Netlify (Frontend + Serverless)

Zero-downtime is built in. Every deploy creates an immutable deployment. Traffic atomically shifts from old to new. Rollback is instant — just point production to a previous deployment.

json

1// vercel.json — zero-downtime by default

3 "builds": [{ "src": "next.config.ts", "use": "@vercel/next" }],

4 "routes": [{ "src": "/(.*)", "dest": "/" }]

Cost: Free tier handles most startups. No DevOps needed.

Railway / Render (Backend Services)

Both platforms support rolling deploys with health checks:

yaml

1# render.yaml

2services:

3 - type: web

4 name: api

5 runtime: node

6 buildCommand: npm run build

7 startCommand: npm start

8 healthCheckPath: /health

9 numInstances: 2

10 autoDeploy: true

With 2+ instances and a health check path, Render performs rolling updates automatically. No Kubernetes, no Argo Rollouts, no CI/CD pipeline to maintain.

Fly.io (Containers with Built-in Rolling Deploys)

toml

1# fly.toml

2[http_service]

3 internal_port = 8080

4 force_https = true

5 auto_start_machines = true

6 auto_stop_machines = true

7 min_machines_running = 2

9[http_service.concurrency]

10 type = "requests"

11 hard_limit = 250

12 soft_limit = 200

14[[http_service.checks]]

15 interval = "10s"

16 timeout = "5s"

17 grace_period = "10s"

18 method = "GET"

19 path = "/health"

21[deploy]

22 strategy = "rolling"

Health Check Implementation

The foundation of zero-downtime deploys. Get this right first:

typescript

1// routes/health.ts (Express/Hono/Fastify)

2import { Router } from 'express';

4const router = Router();

6let isShuttingDown = false;

8// Readiness: can this instance serve traffic?

9router.get('/health', async (req, res) => {

10 if (isShuttingDown) {

11 return res.status(503).json({ status: 'shutting_down' });

12 }

14 try {

15 // Check critical dependencies

16 await prisma.$queryRaw`SELECT 1`;

17 res.json({ status: 'healthy' });

18 } catch {

19 res.status(503).json({ status: 'unhealthy' });

20 }

21});

23// Graceful shutdown

24process.on('SIGTERM', () => {

25 isShuttingDown = true;

27 // Give load balancer time to stop sending traffic

28 setTimeout(() => {

29 server.close(() => {

30 process.exit(0);

31 });

32 }, 10000);

33});

35export default router;

Key requirements:

Health check must fail when the process is shutting down
Database/Redis connectivity should be verified
Response time should be under 100ms

Docker-Based Deployment

When you outgrow platform-managed deployments, Docker with a process manager provides zero-downtime:

Docker Compose with Health Checks

yaml

1# docker-compose.yml

2services:

3 api:

4 image: api:latest

5 deploy:

6 replicas: 2

7 update_config:

8 parallelism: 1

9 delay: 30s

10 order: start-first # Start new before stopping old

11 rollback_config:

12 parallelism: 0

13 order: stop-first

14 healthcheck:

15 test: ["CMD", "curl", "-f", "http://localhost:8080/health"]

16 interval: 10s

17 timeout: 5s

18 retries: 3

19 start_period: 30s

20 ports:

21 - "8080:8080"

23 nginx:

24 image: nginx:alpine

25 ports:

26 - "80:80"

27 - "443:443"

28 volumes:

29 - ./nginx.conf:/etc/nginx/nginx.conf

30 depends_on:

31 api:

32 condition: service_healthy

Nginx Upstream with Health Checks

nginx

1upstream api_servers {

2 server api:8080 max_fails=3 fail_timeout=30s;

3 keepalive 32;

6server {

7 listen 80;

9 location / {

10 proxy_pass http://api_servers;

11 proxy_http_version 1.1;

12 proxy_set_header Connection "";

13 proxy_set_header Host $host;

14 proxy_set_header X-Real-IP $remote_addr;

16 # Retry on connection errors (not on 5xx)

17 proxy_next_upstream error timeout;

18 proxy_next_upstream_tries 2;

19 }

21 location /health {

22 proxy_pass http://api_servers;

23 access_log off;

24 }

25}

GitHub Actions CI/CD

A simple but effective deployment pipeline:

yaml

1# .github/workflows/deploy.yml

2name: Deploy

3on:

4 push:

5 branches: [main]

7jobs:

8 test:

9 runs-on: ubuntu-latest

10 steps:

11 - uses: actions/checkout@v4

12 - uses: oven-sh/setup-bun@v2

13 - run: bun install

14 - run: bun run lint

15 - run: bun test

17 deploy:

18 needs: test

19 runs-on: ubuntu-latest

20 steps:

21 - uses: actions/checkout@v4

23 - name: Build and push Docker image

24 run: |

25 docker build -t $REGISTRY/$IMAGE:${{ github.sha }} .

26 docker push $REGISTRY/$IMAGE:${{ github.sha }}

28 - name: Deploy with rolling update

29 run: |

30 ssh deploy@$SERVER "

31 docker pull $REGISTRY/$IMAGE:${{ github.sha }}

32 docker compose up -d --no-deps api

33 "

35 - name: Verify deployment

36 run: |

37 for i in {1..30}; do

38 if curl -sf https://api.example.com/health; then

39 echo 'Deployment healthy'

40 exit 0

41 fi

42 sleep 2

43 done

44 echo 'Deployment health check failed'

45 exit 1

47 - name: Rollback on failure

48 if: failure()

49 run: |

50 ssh deploy@$SERVER "

51 docker compose up -d --no-deps api

52 "

Need a second opinion on your DevOps pipelines architecture?

I run free 30-minute strategy calls for engineering teams tackling this exact problem.

Book a Free Call

Database Migrations for Startups

Use the additive-only pattern — only add columns and tables, never remove or rename in the same deploy:

typescript

1// Safe migration pattern

2// Deploy 1: Add new column (nullable)

3// prisma/migrations/001_add_email_column.sql

4ALTER TABLE users ADD COLUMN email_verified BOOLEAN;

6// Deploy 2: Start writing to new column

7// Your application code

8await prisma.user.update({

9 where: { id: userId },

10 data: {

11 emailVerified: true,

12 // Keep writing to old fields too

13 },

14});

16// Deploy 3: Backfill existing records

17// scripts/backfill-email-verified.ts

18const BATCH_SIZE = 1000;

19let cursor = 0;

21while (true) {

22 const updated = await prisma.$executeRaw`

23 UPDATE users

24 SET email_verified = true

25 WHERE id > ${cursor}

26 AND email_verified IS NULL

27 AND email_confirmed_at IS NOT NULL

28 LIMIT ${BATCH_SIZE}

29 `;

30 if (updated === 0) break;

31 cursor += BATCH_SIZE;

32 await new Promise(r => setTimeout(r, 100)); // Avoid overloading DB

33}

35// Deploy 4: Make column non-nullable and remove old column

36// Only after Deploy 3 is verified

Environment Variable Management

Use environment-based feature toggles for zero-downtime changes:

typescript

1// lib/config.ts

2export const config = {

3 features: {

4 newCheckout: process.env.FEATURE_NEW_CHECKOUT === 'true',

5 v2Api: process.env.FEATURE_V2_API === 'true',

6 },

7} as const;

9// Usage

10if (config.features.newCheckout) {

11 return handleNewCheckout(req);

12}

13return handleLegacyCheckout(req);

Update the environment variable in your platform (Railway, Fly.io, Render) to toggle features without deploying code. Most platforms restart instances when environment variables change, so combine with rolling deploys.

Anti-Patterns to Avoid

Over-Engineering the Pipeline

A startup with 3 engineers doesn't need Argo Rollouts, Istio service mesh, and a custom deployment controller. Start with your platform's built-in deployments. Add complexity only when you outgrow the simple approach.

Deploying from Laptops

ssh prod && git pull && npm start works until it doesn't. The first time someone deploys from a branch with uncommitted changes, you'll understand why CI/CD exists. Set up GitHub Actions on day one.

Skipping Health Checks

Without health checks, your platform can't distinguish a healthy deployment from a crashed one. A simple /health endpoint that returns 200 takes 5 minutes to implement and prevents 90% of bad deployments from reaching users.

Running Database Migrations in the Deployment Pipeline

Don't tie migrations to deployments. Run migrations separately, verify they succeeded, then deploy the code that uses the new schema. This lets you roll back the code without rolling back the migration.

No Rollback Plan

Every deployment should have a documented rollback path. For platform-managed deployments, this is usually "redeploy the previous commit." Test this before you need it in a crisis.

Startup Readiness Checklist

FAQ

Need expert help?

Building with CI/CD pipelines?

I help teams ship production-grade systems. From architecture review to hands-on builds.

Book a Free Call Send a Brief

zero-downtime blue-green canary deployment startup best-practices

Muneer Puthiya Purayil

SaaS Architect & AI Systems Engineer. 10+ years shipping production infrastructure across fintech, automotive, e-commerce, and healthcare.

View Portfolio Book a Call

← Previous

Zero-Downtime Deployments Best Practices for Startup Teams

Start Simple: Platform-Managed Deployments

Vercel / Netlify (Frontend + Serverless)

Railway / Render (Backend Services)

Fly.io (Containers with Built-in Rolling Deploys)

Health Check Implementation

Docker-Based Deployment

Docker Compose with Health Checks

Nginx Upstream with Health Checks

GitHub Actions CI/CD

Database Migrations for Startups

Environment Variable Management

Anti-Patterns to Avoid

Over-Engineering the Pipeline

Deploying from Laptops

Skipping Health Checks

Running Database Migrations in the Deployment Pipeline

No Rollback Plan

Startup Readiness Checklist

FAQ

Building with CI/CD pipelines?

Zero-Downtime Deployments Best Practices for High Scale Teams

Zero-Downtime Deployments Best Practices for Enterprise Teams

Zero-Downtime Deployments at Scale: Lessons from Production

Zero-Downtime Deployments Best Practices for Enterprise Teams

How to Build Zero-Downtime Deployments Using Spring Boot

Start a
Conversation.

Start Simple: Platform-Managed Deployments

Vercel / Netlify (Frontend + Serverless)

Railway / Render (Backend Services)

Fly.io (Containers with Built-in Rolling Deploys)

Health Check Implementation

Docker-Based Deployment

Docker Compose with Health Checks

Nginx Upstream with Health Checks

GitHub Actions CI/CD

Database Migrations for Startups

Environment Variable Management

Anti-Patterns to Avoid

Over-Engineering the Pipeline

Deploying from Laptops

Skipping Health Checks

Running Database Migrations in the Deployment Pipeline

No Rollback Plan

Startup Readiness Checklist

FAQ

Building with CI/CD pipelines?

Zero-Downtime Deployments Best Practices for High Scale Teams

Zero-Downtime Deployments Best Practices for Enterprise Teams

Zero-Downtime Deployments at Scale: Lessons from Production

Zero-Downtime Deployments Best Practices for Enterprise Teams

How to Build Zero-Downtime Deployments Using Spring Boot

Start aConversation.

Start a
Conversation.