Startups running Kubernetes face a different set of constraints than established companies. Budget is tight, the team is small (often one or two people managing infrastructure), and speed of iteration matters more than theoretical perfection. These practices focus on getting a production-ready Kubernetes setup without over-engineering — maximizing reliability per dollar spent.
Start with Managed Kubernetes
Self-hosting the Kubernetes control plane is never the right call for a startup. EKS, GKE, or AKS eliminate the need to manage etcd, the API server, and controller managers. The $75/month cost for an EKS control plane is trivially cheap compared to the engineering hours of debugging a self-managed etcd cluster at 3 AM.
GKE Autopilot is worth considering for very early-stage startups. It removes node management entirely, charging per pod resource request. You lose some flexibility but eliminate an entire category of operational concerns.
Resource Requests: Always Set Them
The single most impactful practice for cluster stability is setting resource requests on every container. Without them, the scheduler cannot make informed placement decisions, and pods compete for resources unpredictably.
A common startup mistake is setting CPU limits. CPU is a compressible resource — when a pod hits its CPU limit, it gets throttled rather than killed. This throttling creates latency spikes that are difficult to diagnose. Set CPU requests for scheduling but omit CPU limits unless you have a specific reason.
Memory limits are different. Memory is incompressible — a pod exceeding its memory limit gets OOM-killed. Always set memory limits.
Namespace Strategy
Keep it simple. Three namespaces cover most startup needs:
Add ResourceQuotas to prevent a single namespace from consuming the entire cluster:
Ingress with Cert-Manager
Every startup needs TLS termination and routing. The nginx ingress controller plus cert-manager is the standard, well-tested combination:
GitOps from Day One
ArgoCD or Flux should be your deployment mechanism from the first day. Manual kubectl apply doesn't scale, and more importantly, it doesn't provide an audit trail or easy rollbacks.
The overhead of setting up ArgoCD is about 2 hours. The time saved on the first rollback pays that back immediately.
Need a second opinion on your DevOps pipelines architecture?
I run free 30-minute strategy calls for engineering teams tackling this exact problem.
Book a Free CallCost Optimization
Spot Instances for Non-Critical Workloads
Spot instances save 60-70% on compute costs. Use them for background workers, batch jobs, and any workload that handles interruption gracefully. Keep your API servers on on-demand instances.
Karpenter for Smarter Scaling
Karpenter provisions nodes faster than the Kubernetes Cluster Autoscaler (typically 60 seconds vs 3-5 minutes) and makes better instance selection decisions. For a startup where every pod minute costs money, this matters.
Monitoring on a Budget
You don't need a full observability platform on day one. Start with the basics:
Seven days of retention is sufficient for a startup. If you need longer-term metrics, add Thanos or Grafana Mimir later. The key dashboards to set up immediately are node resource utilization, pod restart counts, and request latency by service.
Anti-Patterns to Avoid
Over-engineering with service mesh. Istio adds 200Mi+ memory per sidecar and significant operational complexity. Unless you have specific mTLS, traffic management, or observability needs that can't be met by simpler tools, skip it until you have the team to manage it.
Running databases in Kubernetes. Managed databases (RDS, Cloud SQL, Atlas) are almost always the right choice for startups. StatefulSets work, but the operational overhead of managing persistent volumes, backup schedules, and failover in Kubernetes is substantial for a small team.
Creating too many environments. Production and staging are sufficient. Each additional environment costs money and maintenance time. Use feature flags instead of long-lived preview environments.
Ignoring security basics. Network policies, RBAC, and pod security standards take an afternoon to set up and prevent entire categories of incidents. The startup that skips security "to move faster" moves slower after their first breach.
Not setting up PDBs. Even with 2-3 replicas, PDBs prevent cluster upgrades from taking down your service. A single line of YAML prevents hours of downtime.
Production Checklist
- Managed Kubernetes (EKS/GKE/AKS) — never self-hosted
- Resource requests on every container
- Memory limits on every container (skip CPU limits)
- Health checks (readiness and liveness) on every container
- Cert-manager with Let's Encrypt for TLS
- GitOps deployment via ArgoCD or Flux
- Spot instances for non-critical workloads
- Karpenter or Cluster Autoscaler configured
- kube-prometheus-stack for monitoring
- PDBs on services with 2+ replicas
- ResourceQuotas per namespace
- RBAC with least-privilege access
- Default-deny network policies
- Pod Security Standards enforced
- Automated backups for any stateful components
Conclusion
A startup Kubernetes setup should optimize for reliability and operational simplicity, not theoretical completeness. Managed Kubernetes, GitOps, basic monitoring, and proper resource configuration cover 90% of what a small team needs. Every additional layer of complexity — service mesh, custom operators, multi-cluster federation — should be deferred until the team and traffic justify it.
The practices here represent approximately one week of setup work for a single engineer. After that initial investment, the ongoing operational burden is minimal: dependency updates, certificate rotations (automated via cert-manager), and responding to alerts. This foundation scales comfortably to serve thousands of requests per second across dozens of services.