Back to Journal
DevOps

Complete Guide to Kubernetes Production Setup with Java

A comprehensive guide to implementing Kubernetes Production Setup using Java, covering architecture, code examples, and production-ready patterns.

Muneer Puthiya Purayil 19 min read

Java's mature ecosystem, battle-tested libraries, and the JVM's runtime optimizations make it a strong choice for Kubernetes workloads — if you account for the JVM's unique operational characteristics. Container-aware JVM tuning, proper memory configuration, and startup optimization are essential for running Java effectively in orchestrated environments.

Container-Aware JVM Configuration

The JVM historically had poor container support, defaulting to host-level CPU and memory detection. Modern JVMs (17+) handle containers correctly, but explicit configuration ensures predictable behavior.

dockerfile
1FROM eclipse-temurin:21-jdk-alpine AS builder
2WORKDIR /app
3COPY gradle/ gradle/
4COPY gradlew build.gradle.kts settings.gradle.kts ./
5RUN ./gradlew dependencies --no-daemon
6COPY src/ src/
7RUN ./gradlew bootJar --no-daemon -x test
8 
9FROM eclipse-temurin:21-jre-alpine
10RUN addgroup -g 1001 -S appuser && adduser -S appuser -u 1001
11WORKDIR /app
12COPY --from=builder /app/build/libs/app.jar ./app.jar
13USER appuser
14EXPOSE 8080
15ENTRYPOINT ["java", \
16 "-XX:MaxRAMPercentage=75.0", \
17 "-XX:InitialRAMPercentage=50.0", \
18 "-XX:+UseG1GC", \
19 "-XX:MaxGCPauseMillis=200", \
20 "-XX:+UseStringDeduplication", \
21 "-Djava.security.egd=file:/dev/urandom", \
22 "-jar", "app.jar"]
23 

Critical JVM flags for Kubernetes:

  • MaxRAMPercentage=75.0 — The JVM uses 75% of the container memory limit for heap, leaving 25% for metaspace, thread stacks, native memory, and the OS page cache. Setting this too high (>80%) leads to OOM kills because the JVM's non-heap memory needs are significant.
  • UseG1GC — G1 is the default in JDK 17+ and provides the best balance of throughput and latency for typical web services.
  • UseStringDeduplication — In microservices with heavy JSON processing, duplicate strings consume 10-25% of heap. This flag deduplicates them during GC with minimal overhead.

Spring Boot on Kubernetes

Application Configuration

yaml
1# application.yaml
2server:
3 port: 8080
4 shutdown: graceful
5 tomcat:
6 max-threads: 200
7 accept-count: 100
8 connection-timeout: 5s
9 
10spring:
11 lifecycle:
12 timeout-per-shutdown-phase: 30s
13 datasource:
14 url: ${DATABASE_URL}
15 hikari:
16 maximum-pool-size: 20
17 minimum-idle: 5
18 idle-timeout: 300000
19 max-lifetime: 1800000
20 connection-timeout: 5000
21 
22management:
23 endpoints:
24 web:
25 exposure:
26 include: health,prometheus,info
27 endpoint:
28 health:
29 probes:
30 enabled: true
31 group:
32 readiness:
33 include: db,diskSpace
34 liveness:
35 include: ping
36 metrics:
37 tags:
38 application: ${spring.application.name}
39 

Spring Boot 3.x includes built-in Kubernetes probe support. Setting management.endpoint.health.probes.enabled=true exposes /actuator/health/liveness and /actuator/health/readiness automatically. The readiness probe checks database connectivity; the liveness probe only checks process health.

Kubernetes Deployment

yaml
1apiVersion: apps/v1
2kind: Deployment
3metadata:
4 name: order-service
5spec:
6 replicas: 3
7 selector:
8 matchLabels:
9 app: order-service
10 template:
11 metadata:
12 labels:
13 app: order-service
14 annotations:
15 prometheus.io/scrape: "true"
16 prometheus.io/port: "8080"
17 prometheus.io/path: "/actuator/prometheus"
18 spec:
19 terminationGracePeriodSeconds: 35
20 containers:
21 - name: order-service
22 image: registry.example.com/order-service:v2.1.0
23 ports:
24 - containerPort: 8080
25 name: http
26 env:
27 - name: JAVA_TOOL_OPTIONS
28 value: "-XX:MaxRAMPercentage=75.0 -XX:+UseG1GC"
29 - name: DATABASE_URL
30 valueFrom:
31 secretKeyRef:
32 name: order-service-secrets
33 key: database-url
34 resources:
35 requests:
36 cpu: 500m
37 memory: 512Mi
38 limits:
39 memory: 1Gi
40 readinessProbe:
41 httpGet:
42 path: /actuator/health/readiness
43 port: http
44 initialDelaySeconds: 20
45 periodSeconds: 10
46 timeoutSeconds: 5
47 failureThreshold: 3
48 livenessProbe:
49 httpGet:
50 path: /actuator/health/liveness
51 port: http
52 initialDelaySeconds: 30
53 periodSeconds: 15
54 timeoutSeconds: 5
55 failureThreshold: 3
56 startupProbe:
57 httpGet:
58 path: /actuator/health/liveness
59 port: http
60 initialDelaySeconds: 10
61 periodSeconds: 5
62 failureThreshold: 30
63 lifecycle:
64 preStop:
65 exec:
66 command: ["sh", "-c", "sleep 5"]
67 

Java-specific deployment considerations:

  • Higher initialDelaySeconds. Spring Boot applications take 10-30 seconds to start, unlike Go (milliseconds) or Node.js (1-3 seconds). Use a startupProbe with generous failure thresholds instead of delaying the readiness probe excessively.
  • preStop sleep. The 5-second sleep before SIGTERM gives the Kubernetes endpoints controller time to remove the pod from service routing. Without this, in-flight requests hit terminating pods during the brief propagation delay.
  • Memory requests at 512Mi minimum. The JVM's baseline memory consumption (heap + metaspace + thread stacks + GC overhead) rarely goes below 300Mi for a Spring Boot application. Requesting 256Mi will cause OOM kills.

Startup Optimization

JVM startup time is the primary operational challenge in Kubernetes. A slow-starting application delays rolling updates and scaling events.

Class Data Sharing (CDS)

dockerfile
1FROM eclipse-temurin:21-jre-alpine
2 
3COPY app.jar /app/app.jar
4WORKDIR /app
5 
6# Generate CDS archive during build
7RUN java -XX:ArchiveClassesAtExit=app-cds.jsa \
8 -Dspring.context.exit=onRefresh \
9 -jar app.jar || true
10 
11ENTRYPOINT ["java", \
12 "-XX:SharedArchiveFile=app-cds.jsa", \
13 "-XX:MaxRAMPercentage=75.0", \
14 "-jar", "app.jar"]
15 

CDS pre-processes class metadata and stores it in a shared archive. This reduces startup time by 20-40% for Spring Boot applications by avoiding redundant class loading and verification work at startup.

GraalVM Native Image

dockerfile
1FROM ghcr.io/graalvm/native-image-community:21 AS builder
2WORKDIR /app
3COPY gradle/ gradle/
4COPY gradlew build.gradle.kts settings.gradle.kts ./
5RUN ./gradlew dependencies --no-daemon
6COPY src/ src/
7RUN ./gradlew nativeCompile --no-daemon
8 
9FROM gcr.io/distroless/base-debian12:nonroot
10COPY --from=builder /app/build/native/nativeCompile/app /app
11EXPOSE 8080
12ENTRYPOINT ["/app"]
13 

GraalVM native images start in 100-300ms (vs 10-30 seconds for JVM) and use 50-70% less memory. The tradeoff is longer build times (5-15 minutes), potential reflection and serialization compatibility issues, and reduced peak throughput. Native images are ideal for serverless-style workloads with frequent scale-to-zero events; traditional JVM is better for long-running services where throughput matters more than startup.

Micrometer Metrics and Prometheus

java
1@Configuration
2public class MetricsConfig {
3 
4 @Bean
5 public TimedAspect timedAspect(MeterRegistry registry) {
6 return new TimedAspect(registry);
7 }
8 
9 @Bean
10 public MeterRegistryCustomizer<MeterRegistry> commonTags(
11 @Value("${spring.application.name}") String appName) {
12 return registry -> registry.config()
13 .commonTags(
14 "application", appName,
15 "region", System.getenv().getOrDefault("AWS_REGION", "unknown")
16 );
17 }
18}
19 
20@RestController
21@RequestMapping("/api/v1/orders")
22public class OrderController {
23 
24 private final OrderService orderService;
25 private final Counter orderCounter;
26 private final Timer orderTimer;
27 
28 public OrderController(OrderService orderService, MeterRegistry registry) {
29 this.orderService = orderService;
30 this.orderCounter = Counter.builder("orders.created")
31 .description("Total orders created")
32 .register(registry);
33 this.orderTimer = Timer.builder("orders.processing.duration")
34 .description("Order processing duration")
35 .publishPercentiles(0.5, 0.95, 0.99)
36 .register(registry);
37 }
38 
39 @PostMapping
40 public ResponseEntity<Order> createOrder(@RequestBody @Valid CreateOrderRequest request) {
41 return orderTimer.record(() -> {
42 Order order = orderService.create(request);
43 orderCounter.increment();
44 return ResponseEntity.status(HttpStatus.CREATED).body(order);
45 });
46 }
47}
48 

Micrometer integrates with Spring Boot's actuator to expose JVM-specific metrics (heap usage, GC pause times, thread counts) alongside application metrics. Key JVM metrics to monitor in Kubernetes:

  • jvm.memory.used — Watch for memory creep toward the container limit
  • jvm.gc.pause — G1 pauses above 500ms indicate heap sizing issues
  • jvm.threads.live — Thread count growth without corresponding load suggests thread leaks

Need a second opinion on your DevOps pipelines architecture?

I run free 30-minute strategy calls for engineering teams tackling this exact problem.

Book a Free Call

Connection Pool Tuning

java
1@Configuration
2public class DataSourceConfig {
3 
4 @Bean
5 @ConfigurationProperties("spring.datasource.hikari")
6 public HikariConfig hikariConfig() {
7 HikariConfig config = new HikariConfig();
8 config.setMetricsTrackerFactory(new MicrometerMetricsTrackerFactory(meterRegistry));
9 return config;
10 }
11}
12 

HikariCP sizing formula for Kubernetes: pool_size = (pod_replicas * max_pool_per_pod) ≤ database_max_connections * 0.8. For 3 replicas with 20 connections each, you need a database supporting at least 75 connections (60 active + 20% headroom). During HPA scaling events, connection counts spike — configure the database for your maximum pod count, not your current count.

JVM Warm-up and Readiness

The JVM's JIT compiler optimizes frequently-executed code paths after observing execution patterns. A freshly started pod has higher latency until the JIT warms up.

java
1@Component
2public class WarmupRunner implements ApplicationRunner {
3 
4 private final RestTemplate restTemplate;
5 
6 @Override
7 public void run(ApplicationArguments args) {
8 // Hit critical endpoints to trigger JIT compilation
9 for (int i = 0; i < 1000; i++) {
10 try {
11 restTemplate.getForEntity("http://localhost:8080/api/v1/health", String.class);
12 } catch (Exception ignored) {}
13 }
14 }
15}
16 

A lightweight warm-up routine that exercises hot paths reduces p99 latency for the first few minutes after deployment by 40-60%. Combine this with a readiness probe that includes a latency check to ensure traffic only reaches warmed-up pods.

Anti-Patterns to Avoid

Using -Xmx instead of MaxRAMPercentage. Hard-coded heap sizes don't adapt to different container memory limits across environments. A service configured with -Xmx512m running in a 2Gi container wastes 75% of available memory.

Ignoring non-heap memory. Metaspace, thread stacks (1MB per thread by default), direct byte buffers, and JNI allocations consume memory outside the heap. A common failure mode: -Xmx=900m in a 1Gi container with 200 threads uses 900MB heap + 200MB stacks + 100MB metaspace = OOM kill.

Disabling JVM ergonomics. Flags like -XX:-UseContainerSupport or fixed -XX:ParallelGCThreads override the JVM's automatic container detection. Unless you have measured a specific problem, let the JVM auto-tune based on container limits.

Fat JARs with embedded resources. Spring Boot fat JARs that bundle static assets, test dependencies, or documentation inflate image size. Use Maven or Gradle profiles to exclude non-production dependencies and use a separate CDN for static assets.

Conclusion

Running Java effectively on Kubernetes requires understanding the JVM's resource model. The JVM is not a lightweight runtime — it needs adequate memory for heap, metaspace, and thread stacks, and it needs time to warm up the JIT compiler. Kubernetes operators who account for these characteristics with proper memory configuration (MaxRAMPercentage at 75%), startup probes for slow initialization, and warm-up routines for JIT compilation build Java services that perform predictably under orchestration.

The Spring Boot ecosystem's Kubernetes integration — actuator health probes, Micrometer metrics, graceful shutdown — has matured significantly. Combined with GraalVM native images for startup-critical workloads and CDS for traditional JVM deployments, Java remains a competitive choice for Kubernetes-native applications where the ecosystem's maturity and library breadth outweigh the operational complexity of JVM tuning.

FAQ

Need expert help?

Building with CI/CD pipelines?

I help teams ship production-grade systems. From architecture review to hands-on builds.

Muneer Puthiya Purayil

SaaS Architect & AI Systems Engineer. 10+ years shipping production infrastructure across fintech, automotive, e-commerce, and healthcare.

Engage

Start a
Conversation.

For teams building at scale: SaaS platforms, agentic AI systems, and enterprise mobile infrastructure. Scope and fit are evaluated before any engagement begins.

Limited availability · Q3 / Q4 2026