Back to Journal
DevOps

Complete Guide to Kubernetes Production Setup with Python

A comprehensive guide to implementing Kubernetes Production Setup using Python, covering architecture, code examples, and production-ready patterns.

Muneer Puthiya Purayil 16 min read

Python workloads on Kubernetes present unique challenges: the GIL limits true parallelism, memory usage can be unpredictable with large ML libraries, and dependency management adds significant image bloat. This guide covers production-ready patterns for deploying Python applications — from FastAPI web services to data pipelines — on Kubernetes.

Optimized Container Images

Python images are notoriously large. A naive python:3.12 base image is 1GB+. Proper multi-stage builds and base image selection reduce this dramatically.

dockerfile
1FROM python:3.12-slim AS builder
2WORKDIR /app
3RUN pip install --no-cache-dir uv
4COPY pyproject.toml uv.lock ./
5RUN uv sync --frozen --no-dev
6COPY . .
7 
8FROM python:3.12-slim
9RUN groupadd -g 1001 appuser && useradd -u 1001 -g appuser appuser
10WORKDIR /app
11COPY --from=builder /app/.venv /app/.venv
12COPY --from=builder /app/src ./src
13ENV PATH="/app/.venv/bin:$PATH"
14USER appuser
15EXPOSE 8000
16CMD ["uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "4"]
17 

Using uv instead of pip reduces dependency installation from minutes to seconds. The --frozen flag ensures the lockfile is respected exactly. Separating the virtual environment into the runtime stage avoids including build tools, headers, and compilation artifacts.

For ML workloads with heavy native dependencies (NumPy, pandas, PyTorch), consider distroless Python or custom base images:

dockerfile
1FROM python:3.12-slim AS builder
2WORKDIR /app
3RUN pip install --no-cache-dir uv
4COPY pyproject.toml uv.lock ./
5RUN uv sync --frozen --no-dev
6 
7FROM gcr.io/distroless/python3-debian12
8WORKDIR /app
9COPY --from=builder /app/.venv/lib/python3.12/site-packages /usr/lib/python3.12/site-packages
10COPY src/ ./src/
11EXPOSE 8000
12CMD ["src/main.py"]
13 

FastAPI Application Structure

python
1import asyncio
2import signal
3from contextlib import asynccontextmanager
4from typing import AsyncIterator
5 
6import structlog
7from fastapi import FastAPI, Request
8from prometheus_client import Counter, Histogram, generate_latest, CONTENT_TYPE_LATEST
9import asyncpg
10 
11logger = structlog.get_logger()
12 
13REQUEST_COUNT = Counter(
14 "http_requests_total",
15 "Total HTTP requests",
16 ["method", "endpoint", "status"]
17)
18REQUEST_DURATION = Histogram(
19 "http_request_duration_seconds",
20 "HTTP request duration",
21 ["method", "endpoint"],
22 buckets=[0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0]
23)
24 
25class AppState:
26 db_pool: asyncpg.Pool | None = None
27 shutting_down: bool = False
28 
29state = AppState()
30 
31@asynccontextmanager
32async def lifespan(app: FastAPI) -> AsyncIterator[None]:
33 state.db_pool = await asyncpg.create_pool(
34 dsn=settings.database_url,
35 min_size=5,
36 max_size=20,
37 max_inactive_connection_lifetime=300,
38 )
39 logger.info("database pool created")
40 yield
41 if state.db_pool:
42 await state.db_pool.close()
43 logger.info("database pool closed")
44 
45app = FastAPI(lifespan=lifespan)
46 
47@app.middleware("http")
48async def metrics_middleware(request: Request, call_next):
49 import time
50 start = time.perf_counter()
51 response = await call_next(request)
52 duration = time.perf_counter() - start
53 REQUEST_COUNT.labels(
54 method=request.method,
55 endpoint=request.url.path,
56 status=response.status_code
57 ).inc()
58 REQUEST_DURATION.labels(
59 method=request.method,
60 endpoint=request.url.path
61 ).observe(duration)
62 return response
63 
64@app.get("/healthz")
65async def health():
66 return {"status": "ok"}
67 
68@app.get("/readyz")
69async def ready():
70 if state.shutting_down:
71 return {"status": "shutting down"}, 503
72 try:
73 async with state.db_pool.acquire() as conn:
74 await conn.fetchval("SELECT 1")
75 return {"status": "ready"}
76 except Exception:
77 return {"status": "not ready"}, 503
78 
79@app.get("/metrics")
80async def metrics():
81 from starlette.responses import Response
82 return Response(
83 content=generate_latest(),
84 media_type=CONTENT_TYPE_LATEST
85 )
86 

Worker Configuration

Python's GIL means CPU-bound work in a single process only uses one core. For web services, use multiple Uvicorn workers:

yaml
1env:
2- name: WEB_CONCURRENCY
3 value: "4" # Number of Uvicorn workers
4- name: UVICORN_WORKERS
5 value: "4"
6 

The formula: workers = 2 * cpu_cores + 1 for I/O-bound services, workers = cpu_cores for CPU-bound services. Each Uvicorn worker is a separate process consuming 50-150MB of memory. A pod with 4 workers on a 1Gi memory limit leaves 400-600MB for request handling per worker.

Kubernetes Deployment

yaml
1apiVersion: apps/v1
2kind: Deployment
3metadata:
4 name: api-service
5spec:
6 replicas: 3
7 selector:
8 matchLabels:
9 app: api-service
10 template:
11 metadata:
12 labels:
13 app: api-service
14 annotations:
15 prometheus.io/scrape: "true"
16 prometheus.io/port: "8000"
17 prometheus.io/path: "/metrics"
18 spec:
19 terminationGracePeriodSeconds: 30
20 containers:
21 - name: api
22 image: registry.example.com/api-service:v1.3.0
23 ports:
24 - containerPort: 8000
25 name: http
26 env:
27 - name: DATABASE_URL
28 valueFrom:
29 secretKeyRef:
30 name: api-secrets
31 key: database-url
32 - name: WEB_CONCURRENCY
33 value: "4"
34 resources:
35 requests:
36 cpu: 500m
37 memory: 512Mi
38 limits:
39 memory: 1Gi
40 readinessProbe:
41 httpGet:
42 path: /readyz
43 port: http
44 initialDelaySeconds: 5
45 periodSeconds: 10
46 livenessProbe:
47 httpGet:
48 path: /healthz
49 port: http
50 initialDelaySeconds: 10
51 periodSeconds: 15
52 securityContext:
53 allowPrivilegeEscalation: false
54 readOnlyRootFilesystem: true
55 runAsNonRoot: true
56 runAsUser: 1001
57 capabilities:
58 drop: ["ALL"]
59 volumeMounts:
60 - name: tmp
61 mountPath: /tmp
62 volumes:
63 - name: tmp
64 emptyDir: {}
65 

Python applications need a writable /tmp directory for compiled .pyc files and temporary file operations. The emptyDir volume satisfies this requirement while keeping the root filesystem read-only.

Celery Workers on Kubernetes

Background task processing with Celery requires separate deployments for the worker and beat scheduler:

yaml
1apiVersion: apps/v1
2kind: Deployment
3metadata:
4 name: celery-worker
5spec:
6 replicas: 4
7 selector:
8 matchLabels:
9 app: celery-worker
10 template:
11 spec:
12 containers:
13 - name: worker
14 image: registry.example.com/api-service:v1.3.0
15 command: ["celery", "-A", "src.celery_app", "worker",
16 "--loglevel=info",
17 "--concurrency=4",
18 "--max-tasks-per-child=1000",
19 "--without-heartbeat"]
20 env:
21 - name: CELERY_BROKER_URL
22 valueFrom:
23 secretKeyRef:
24 name: celery-secrets
25 key: broker-url
26 resources:
27 requests:
28 cpu: 500m
29 memory: 512Mi
30 limits:
31 memory: 1Gi
32 livenessProbe:
33 exec:
34 command: ["celery", "-A", "src.celery_app", "inspect", "ping",
35 "--timeout", "5"]
36 initialDelaySeconds: 30
37 periodSeconds: 60
38 timeoutSeconds: 10
39---
40apiVersion: apps/v1
41kind: Deployment
42metadata:
43 name: celery-beat
44spec:
45 replicas: 1
46 selector:
47 matchLabels:
48 app: celery-beat
49 template:
50 spec:
51 containers:
52 - name: beat
53 image: registry.example.com/api-service:v1.3.0
54 command: ["celery", "-A", "src.celery_app", "beat",
55 "--loglevel=info"]
56 resources:
57 requests:
58 cpu: 100m
59 memory: 128Mi
60 limits:
61 memory: 256Mi
62 

Key Celery-on-Kubernetes patterns:

  • --max-tasks-per-child=1000 prevents memory leaks by recycling workers after 1,000 tasks. Python's garbage collector doesn't always reclaim fragmented memory.
  • --without-heartbeat reduces Redis/RabbitMQ overhead when pod health is managed by Kubernetes probes.
  • Celery Beat must run as a single replica. Use a Deployment with replicas: 1 and consider adding a leader election sidecar for high availability.

Need a second opinion on your DevOps pipelines architecture?

I run free 30-minute strategy calls for engineering teams tackling this exact problem.

Book a Free Call

Data Pipeline Jobs

yaml
1apiVersion: batch/v1
2kind: CronJob
3metadata:
4 name: daily-etl
5spec:
6 schedule: "0 2 * * *"
7 concurrencyPolicy: Forbid
8 jobTemplate:
9 spec:
10 backoffLimit: 3
11 activeDeadlineSeconds: 3600
12 template:
13 spec:
14 restartPolicy: Never
15 containers:
16 - name: etl
17 image: registry.example.com/etl-pipeline:v1.0.0
18 command: ["python", "-m", "src.etl.daily_pipeline"]
19 resources:
20 requests:
21 cpu: "2"
22 memory: 4Gi
23 limits:
24 memory: 8Gi
25 env:
26 - name: PYTHONUNBUFFERED
27 value: "1"
28 

PYTHONUNBUFFERED=1 ensures print statements and log output appear in kubectl logs immediately rather than being buffered. Without this, debugging failed jobs requires waiting for buffer flushes that may never happen if the process crashes.

Memory Management

Python's memory allocator doesn't return freed memory to the OS reliably. This means a pod that processes a large batch job can hold onto memory long after the data is garbage collected.

python
1import gc
2import tracemalloc
3 
4tracemalloc.start()
5 
6def process_large_batch(items: list[dict]) -> list[dict]:
7 results = []
8 # Process in chunks to limit peak memory
9 chunk_size = 1000
10 for i in range(0, len(items), chunk_size):
11 chunk = items[i:i + chunk_size]
12 chunk_results = [transform(item) for item in chunk]
13 results.extend(chunk_results)
14 del chunk, chunk_results
15 gc.collect() # Force garbage collection between chunks
16
17 snapshot = tracemalloc.take_snapshot()
18 top_stats = snapshot.statistics("lineno")
19 for stat in top_stats[:5]:
20 logger.info("memory_usage", stat=str(stat))
21
22 return results
23 

For Kubernetes, this means setting memory limits with headroom for Python's memory management behavior. If your application's working set is 500MB, set the memory limit to 800MB-1GB to account for fragmentation and GC overhead.

Anti-Patterns to Avoid

Using python:3.12 as the base image. The full Python image includes compilers, headers, and development tools totaling 1GB+. Always use python:3.12-slim or distroless variants.

Running a single Uvicorn worker. A single-worker Python process handles one request at a time (for CPU-bound work). With no worker scaling, a single slow request blocks all other requests. Always run multiple workers.

Ignoring SIGTERM handling. Uvicorn handles SIGTERM by default, but Celery workers need explicit signal handling to finish in-progress tasks. Without graceful shutdown, tasks are lost during rolling updates.

Installing packages at runtime. Pods that run pip install on startup have unpredictable startup times and fail when PyPI is unreachable. All dependencies must be baked into the container image.

Not setting PYTHONDONTWRITEBYTECODE. Without PYTHONDONTWRITEBYTECODE=1 or a writable /tmp, Python crashes trying to write .pyc files to a read-only filesystem.

Conclusion

Python on Kubernetes requires accepting the language's runtime characteristics — the GIL, memory management quirks, and larger image sizes — and compensating at the infrastructure level. Multi-worker processes, generous memory limits with headroom for fragmentation, and proper container image optimization are the foundations.

The Python ecosystem's strength lies in its library breadth, particularly for data science and ML workloads. A well-optimized Python deployment on Kubernetes — using uv for dependency management, multi-stage builds for image size, and Celery for background processing — provides a productive development environment with production-grade operations.

FAQ

Need expert help?

Building with CI/CD pipelines?

I help teams ship production-grade systems. From architecture review to hands-on builds.

Muneer Puthiya Purayil

SaaS Architect & AI Systems Engineer. 10+ years shipping production infrastructure across fintech, automotive, e-commerce, and healthcare.

Engage

Start a
Conversation.

For teams building at scale: SaaS platforms, agentic AI systems, and enterprise mobile infrastructure. Scope and fit are evaluated before any engagement begins.

Limited availability · Q3 / Q4 2026