Running Kubernetes at high scale demands a fundamentally different operational mindset than managing a handful of clusters. When you're orchestrating thousands of pods across multiple regions, every misconfiguration compounds. This guide distills production-tested practices from teams running 500+ node clusters serving millions of requests per second.
Resource Management at Scale
Resource requests and limits are the foundation of stable high-scale clusters. Underspecified resources lead to noisy-neighbor problems; overspecified resources waste capacity.
Right-Sizing with VPA and Goldilocks
At high scale, run VPA in recommendation mode and feed its suggestions into your CI pipeline rather than allowing live mutations. Direct pod updates cause restarts that cascade through large deployments.
Guaranteed QoS for Critical Paths
For latency-sensitive services, set requests equal to limits to achieve Guaranteed QoS class. This prevents CPU throttling and OOM kills during traffic spikes. At scale, the 15-20% capacity overhead pays for itself in reduced incident frequency.
Cluster Architecture for Multi-Region
Topology-Aware Routing
Topology-aware routing reduces cross-zone traffic by 60-70%, which at high scale translates to significant cost savings. A team running 2,000 pods across 3 AZs saved $14,000/month in inter-AZ data transfer after enabling this.
Node Pool Segmentation
Separate node pools by workload characteristics: compute-intensive, memory-intensive, GPU, and general-purpose. This prevents resource contention and allows independent scaling. At 500+ nodes, mixing workload types on the same nodes creates unpredictable performance profiles.
Pod Disruption Budgets and Rolling Updates
At high replica counts, percentage-based disruption budgets scale naturally. A fixed maxUnavailable: 1 on a 50-replica deployment makes rollouts painfully slow; 10% allows 5 pods to cycle simultaneously while maintaining 90% capacity.
Network Policies as Default Deny
Default-deny network policies are non-negotiable at scale. A compromised pod in a 1,000-pod cluster with no network policies has lateral movement access to everything. Start with deny-all and whitelist explicitly.
Need a second opinion on your DevOps pipelines architecture?
I run free 30-minute strategy calls for engineering teams tackling this exact problem.
Book a Free CallObservability at Scale
Custom Metrics for HPA
CPU-based HPA fails at scale because CPU utilization doesn't correlate with user experience. Use custom metrics — requests per second, p99 latency, queue depth — that reflect actual service health. The asymmetric scale-up/scale-down behavior prevents flapping: scale up aggressively (50% in 60s) but scale down conservatively (10% in 60s with 5-minute stabilization).
Prometheus Federation for Multi-Cluster
At high scale, a single Prometheus instance cannot ingest metrics from all clusters. Federation with recording rules at the edge reduces central ingestion by 90%. Each cluster Prometheus retains full-resolution data for 24 hours; the federation layer stores aggregated metrics for long-term trending.
Anti-Patterns to Avoid
Running without Pod Disruption Budgets. At scale, node drains during upgrades can take down entire services if PDBs aren't configured. A cluster upgrade on 200 nodes without PDBs caused a 45-minute outage when all replicas of a critical service were drained simultaneously.
Using latest tags in production. Image tag immutability is essential when you have 50 deployments across 3 clusters. A single latest tag pointing to a broken image propagates failures across your entire infrastructure in minutes.
Ignoring etcd performance. At 500+ nodes, etcd becomes the bottleneck. Watch count, object count, and request latency all grow non-linearly. Monitor etcd metrics aggressively and consider dedicated etcd nodes with SSD storage.
Skipping admission controllers. Without admission webhooks enforcing resource requests, namespace quotas, and security contexts, a single misconfigured deployment can consume an entire node pool. OPA Gatekeeper or Kyverno should be mandatory at scale.
Manual kubectl operations. GitOps (ArgoCD or Flux) is the only safe deployment mechanism at high scale. Manual kubectl applies across multiple clusters are impossible to audit, impossible to roll back atomically, and guaranteed to cause drift.
Production Checklist
- VPA running in recommendation mode with CI integration
- Guaranteed QoS for latency-critical services
- Topology-aware routing enabled for cross-zone optimization
- Separate node pools for distinct workload types
- PDBs on every production deployment
- Default-deny network policies per namespace
- Custom metrics HPA with asymmetric scaling behavior
- Prometheus federation for multi-cluster monitoring
- GitOps-only deployment pipeline (no manual kubectl)
- etcd monitoring with dedicated alerting
- Resource quotas enforced per namespace
- Pod Security Standards enforced cluster-wide
- Image tag immutability enforced via admission controller
- Cluster autoscaler tuned for scale-up speed (30s scan interval)
- Regular chaos engineering exercises (pod kill, node drain, AZ failure)
Conclusion
High-scale Kubernetes operations require treating your cluster configuration with the same rigor as application code. Every resource manifest, network policy, and autoscaling configuration should be version-controlled, reviewed, and tested before reaching production. The practices outlined here — from VPA-driven right-sizing to federation-based observability — form a cohesive operational framework that scales from 100 to 10,000 pods.
The most critical investment is in guardrails: admission controllers that prevent misconfigurations, PDBs that protect availability during maintenance, and GitOps pipelines that ensure reproducibility. Teams that build these foundations early avoid the operational emergencies that plague organizations scaling reactively.