Back to Journal
DevOps

Complete Guide to Monitoring & Observability with Go

A comprehensive guide to implementing Monitoring & Observability using Go, covering architecture, code examples, and production-ready patterns.

Muneer Puthiya Purayil 17 min read

Go is the lingua franca of cloud-native monitoring. Prometheus, the OpenTelemetry Collector, Grafana, Thanos, and Loki are all written in Go. This guide covers building production-grade monitoring in Go services and contributing to the monitoring ecosystem.

Instrumenting Go Services with Prometheus

HTTP Middleware

go
1package middleware
2 
3import (
4 "net/http"
5 "strconv"
6 "time"
7 
8 "github.com/prometheus/client_golang/prometheus"
9 "github.com/prometheus/client_golang/prometheus/promauto"
10)
11 
12var (
13 httpDuration = promauto.NewHistogramVec(
14 prometheus.HistogramOpts{
15 Name: "http_request_duration_seconds",
16 Help: "Duration of HTTP requests",
17 Buckets: []float64{.005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5},
18 },
19 []string{"method", "path", "status"},
20 )
21 
22 httpTotal = promauto.NewCounterVec(
23 prometheus.CounterOpts{
24 Name: "http_requests_total",
25 Help: "Total HTTP requests",
26 },
27 []string{"method", "path", "status"},
28 )
29 
30 httpInFlight = promauto.NewGauge(prometheus.GaugeOpts{
31 Name: "http_requests_in_flight",
32 Help: "Current in-flight HTTP requests",
33 })
34)
35 
36func Metrics(next http.Handler) http.Handler {
37 return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
38 httpInFlight.Inc()
39 defer httpInFlight.Dec()
40 
41 start := time.Now()
42 wrapped := &statusWriter{ResponseWriter: w, status: 200}
43 next.ServeHTTP(wrapped, r)
44 duration := time.Since(start).Seconds()
45 
46 status := strconv.Itoa(wrapped.status)
47 path := r.URL.Path
48 
49 httpDuration.WithLabelValues(r.Method, path, status).Observe(duration)
50 httpTotal.WithLabelValues(r.Method, path, status).Inc()
51 })
52}
53 
54type statusWriter struct {
55 http.ResponseWriter
56 status int
57}
58 
59func (w *statusWriter) WriteHeader(status int) {
60 w.status = status
61 w.ResponseWriter.WriteHeader(status)
62}
63 

Database Monitoring

go
1package db
2 
3import (
4 "context"
5 "time"
6 
7 "github.com/jackc/pgx/v5/pgxpool"
8 "github.com/prometheus/client_golang/prometheus"
9 "github.com/prometheus/client_golang/prometheus/promauto"
10)
11 
12var (
13 dbQueryDuration = promauto.NewHistogramVec(
14 prometheus.HistogramOpts{
15 Name: "db_query_duration_seconds",
16 Help: "Database query duration",
17 Buckets: []float64{.001, .005, .01, .025, .05, .1, .25, .5, 1},
18 },
19 []string{"query_name", "status"},
20 )
21 
22 dbConnectionsActive = promauto.NewGaugeFunc(
23 prometheus.GaugeOpts{
24 Name: "db_connections_active",
25 Help: "Active database connections",
26 },
27 func() float64 {
28 if pool != nil {
29 return float64(pool.Stat().AcquiredConns())
30 }
31 return 0
32 },
33 )
34)
35 
36var pool *pgxpool.Pool
37 
38func QueryWithMetrics(ctx context.Context, name string, sql string, args ...interface{}) error {
39 start := time.Now()
40 _, err := pool.Exec(ctx, sql, args...)
41 duration := time.Since(start).Seconds()
42 
43 status := "success"
44 if err != nil {
45 status = "error"
46 }
47 dbQueryDuration.WithLabelValues(name, status).Observe(duration)
48 return err
49}
50 

OpenTelemetry Integration

go
1package telemetry
2 
3import (
4 "context"
5 "time"
6 
7 "go.opentelemetry.io/otel"
8 "go.opentelemetry.io/otel/attribute"
9 "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc"
10 "go.opentelemetry.io/otel/exporters/prometheus"
11 "go.opentelemetry.io/otel/sdk/metric"
12 "go.opentelemetry.io/otel/sdk/resource"
13 sdktrace "go.opentelemetry.io/otel/sdk/trace"
14 semconv "go.opentelemetry.io/otel/semconv/v1.24.0"
15)
16 
17func InitTelemetry(ctx context.Context, serviceName string) (func(), error) {
18 res := resource.NewWithAttributes(
19 semconv.SchemaURL,
20 semconv.ServiceName(serviceName),
21 semconv.ServiceVersion("1.0.0"),
22 )
23 
24 // Traces
25 traceExporter, err := otlptracegrpc.New(ctx)
26 if err != nil {
27 return nil, err
28 }
29 tp := sdktrace.NewTracerProvider(
30 sdktrace.WithBatcher(traceExporter),
31 sdktrace.WithResource(res),
32 sdktrace.WithSampler(sdktrace.ParentBased(sdktrace.TraceIDRatioBased(0.1))),
33 )
34 otel.SetTracerProvider(tp)
35 
36 // Metrics
37 promExporter, err := prometheus.New()
38 if err != nil {
39 return nil, err
40 }
41 mp := metric.NewMeterProvider(
42 metric.WithReader(promExporter),
43 metric.WithResource(res),
44 )
45 otel.SetMeterProvider(mp)
46 
47 shutdown := func() {
48 ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
49 defer cancel()
50 tp.Shutdown(ctx)
51 mp.Shutdown(ctx)
52 }
53 
54 return shutdown, nil
55}
56 

Structured Logging with slog

go
1package logging
2 
3import (
4 "context"
5 "log/slog"
6 "os"
7 
8 "go.opentelemetry.io/otel/trace"
9)
10 
11func NewLogger() *slog.Logger {
12 handler := slog.NewJSONHandler(os.Stdout, &slog.HandlerOptions{
13 Level: slog.LevelInfo,
14 })
15 return slog.New(&traceHandler{handler})
16}
17 
18type traceHandler struct {
19 slog.Handler
20}
21 
22func (h *traceHandler) Handle(ctx context.Context, r slog.Record) error {
23 span := trace.SpanFromContext(ctx)
24 if span.SpanContext().HasTraceID() {
25 r.AddAttrs(
26 slog.String("trace_id", span.SpanContext().TraceID().String()),
27 slog.String("span_id", span.SpanContext().SpanID().String()),
28 )
29 }
30 return h.Handler.Handle(ctx, r)
31}
32 

Need a second opinion on your DevOps pipelines architecture?

I run free 30-minute strategy calls for engineering teams tackling this exact problem.

Book a Free Call

Custom Prometheus Exporter

go
1package exporter
2 
3import (
4 "sync"
5 
6 "github.com/prometheus/client_golang/prometheus"
7)
8 
9type QueueExporter struct {
10 mu sync.Mutex
11 queueDepth *prometheus.Desc
12 messageAge *prometheus.Desc
13 queueClient QueueClient
14}
15 
16func NewQueueExporter(client QueueClient) *QueueExporter {
17 return &QueueExporter{
18 queueDepth: prometheus.NewDesc(
19 "queue_depth",
20 "Current number of messages in the queue",
21 []string{"queue_name"},
22 nil,
23 ),
24 messageAge: prometheus.NewDesc(
25 "queue_oldest_message_age_seconds",
26 "Age of the oldest message in the queue",
27 []string{"queue_name"},
28 nil,
29 ),
30 queueClient: client,
31 }
32}
33 
34func (e *QueueExporter) Describe(ch chan<- *prometheus.Desc) {
35 ch <- e.queueDepth
36 ch <- e.messageAge
37}
38 
39func (e *QueueExporter) Collect(ch chan<- prometheus.Metric) {
40 e.mu.Lock()
41 defer e.mu.Unlock()
42 
43 queues, err := e.queueClient.ListQueues()
44 if err != nil {
45 return
46 }
47 
48 for _, q := range queues {
49 ch <- prometheus.MustNewConstMetric(e.queueDepth, prometheus.GaugeValue, float64(q.Depth), q.Name)
50 ch <- prometheus.MustNewConstMetric(e.messageAge, prometheus.GaugeValue, q.OldestMessageAge.Seconds(), q.Name)
51 }
52}
53 

Alerting Rules for Go Services

yaml
1groups:
2 - name: go-service-alerts
3 rules:
4 - alert: HighGoroutineCount
5 expr: go_goroutines > 10000
6 for: 5m
7 annotations:
8 summary: "Goroutine leak detected in {{ $labels.job }}"
9 
10 - alert: HighGCPauseTime
11 expr: rate(go_gc_duration_seconds_sum[5m]) / rate(go_gc_duration_seconds_count[5m]) > 0.001
12 for: 10m
13 annotations:
14 summary: "Average GC pause > 1ms in {{ $labels.job }}"
15 
16 - alert: HighMemoryUsage
17 expr: go_memstats_alloc_bytes / go_memstats_sys_bytes > 0.8
18 for: 10m
19 annotations:
20 summary: "Memory usage > 80% in {{ $labels.job }}"
21 

Conclusion

Go's monitoring ecosystem is the most mature and comprehensive in the cloud-native space. The Prometheus client library, OpenTelemetry SDK, and slog logging package provide a complete observability stack with minimal dependencies. Building custom exporters with the prometheus.Collector interface enables monitoring any system — databases, queues, external APIs — in a format that integrates directly with the standard stack.

The key Go-specific monitoring patterns are: goroutine count tracking for leak detection, GC pause monitoring for latency-sensitive services, and channel-based metric collection for concurrent systems. These runtime-specific metrics complement application-level RED metrics to provide full service visibility.

FAQ

Need expert help?

Building with CI/CD pipelines?

I help teams ship production-grade systems. From architecture review to hands-on builds.

Muneer Puthiya Purayil

SaaS Architect & AI Systems Engineer. 10+ years shipping production infrastructure across fintech, automotive, e-commerce, and healthcare.

Engage

Start a
Conversation.

For teams building at scale: SaaS platforms, agentic AI systems, and enterprise mobile infrastructure. Scope and fit are evaluated before any engagement begins.

Limited availability · Q3 / Q4 2026