Startups need monitoring that provides maximum signal with minimal operational overhead. You have one or two engineers responsible for everything — spending a week setting up a Prometheus cluster is not viable. These practices get you from zero to production-ready observability in a day.
The Minimum Viable Monitoring Stack
For startups, the simplest effective stack is:
- Metrics: Prometheus (or Grafana Cloud free tier)
- Logs: stdout + a cloud provider's built-in log service
- Alerts: PagerDuty or Opsgenie free tier
- Dashboards: Grafana
This single Helm chart deploys Prometheus, Grafana, Alertmanager, and kube-state-metrics. Total resource consumption: ~1.5GB RAM, 2 CPU cores. Setup time: 15 minutes.
Four Essential Alerts
Startups don't need 50 alerts. Start with four that catch 80% of production issues:
These four alerts cover: application errors, infrastructure instability, resource exhaustion, and TLS certificate expiration. Add more only when you've experienced an incident that these didn't catch.
Application Instrumentation
Prometheus Client Integration
Structured Logging
JSON-structured logs with pino are searchable from day one. When you outgrow stdout logging and add a log aggregation service, the structured format means zero refactoring.
Need a second opinion on your DevOps pipelines architecture?
I run free 30-minute strategy calls for engineering teams tackling this exact problem.
Book a Free CallSaaS vs Self-Hosted Decision
| Factor | Self-Hosted | SaaS (Datadog/Grafana Cloud) |
|---|---|---|
| Setup time | 2-4 hours | 30 minutes |
| Monthly cost (<10 services) | $50-100 (infra) | $0-200 (free tiers) |
| Maintenance | 2-4 hours/month | Zero |
| Data retention | You control | Provider limits |
| Scaling | Manual | Automatic |
Recommendation: Start with SaaS (Grafana Cloud free tier gives 10,000 series, 50GB logs, 50GB traces). Switch to self-hosted when costs exceed $500/month or when you need longer retention.
Anti-Patterns to Avoid
Over-instrumenting from day one. You don't need distributed tracing for a monolith. You don't need custom metrics for a service with 100 requests/minute. Start with the four essential alerts and add instrumentation when you need to debug specific issues.
Alerting on metrics you don't act on. Every alert must have a runbook or an obvious action. If the response to an alert is "check the dashboard and usually ignore it," delete the alert.
Logging everything at DEBUG level. DEBUG logs in production generate 10-100x more volume than INFO. The cost increase is immediate; the debugging value is occasional. Log at INFO by default, enable DEBUG for specific services during active debugging.
Production Checklist
- kube-prometheus-stack or equivalent deployed
- Four essential alerts configured (errors, restarts, memory, certs)
- Application metrics exposed (/metrics endpoint)
- Structured JSON logging to stdout
- Grafana dashboard for key business and infrastructure metrics
- PagerDuty/Opsgenie for alert routing
- Log retention policy (7 days minimum)
Conclusion
Startup monitoring should take one day to set up and require less than an hour per month to maintain. The kube-prometheus-stack Helm chart, four essential alerts, and structured logging cover 90% of what a startup needs. Every additional layer of observability — distributed tracing, custom dashboards, anomaly detection — should be added only when a specific incident demonstrates the need.
The most common startup monitoring failure is not under-monitoring — it's over-monitoring without acting on the data. Four alerts that wake someone up and get resolved are infinitely more valuable than 40 alerts that everyone ignores.