Prometheus: definition and best practices

What Prometheus does

Prometheus is an open-source monitoring system created at SoundCloud in 2012, donated to the CNCF in 2016 (the 2nd project after Kubernetes itself) and Graduated as early as 2018. It collects and stores metrics (time series of numerical values) and lets you query them to build dashboards and trigger alerts.

Prometheus's success rests on a simple architecture well suited to dynamic infrastructures: the Prometheus server pulls metrics from target applications (pull model) by regularly polling an HTTP /metrics endpoint. No agent to deploy on every host, no intermediary queue to manage, no push configuration to maintain on the application side.

The data model

Prometheus stores each metric as a time series identified by a name and a set of labels (key/value pairs). Example:

http_requests_total{method="POST", status="200", service="checkout"}

This structure enables fine granularity: you can filter, aggregate, compare metrics along any dimension without having planned that analysis in advance.

The PromQL language exploits this structure. rate(http_requests_total[5m]) computes the request rate per second over the last 5 minutes; sum by (status) (rate(http_requests_total[5m])) aggregates by HTTP code. Complex queries (p95 latency, per-service error rate, capacity planning) are written in a few lines.

The Prometheus ecosystem

Prometheus is rarely deployed alone. The standard stack at Hidora clients usually combines:

Prometheus Server: collection, time-series storage, alert-rule evaluation.
Alertmanager: alert routing to Slack, PagerDuty, email, with deduplication, grouping and silences.
Grafana: metric visualisation through interactive dashboards.
Node Exporter: exposes system metrics (CPU, RAM, disk, network) for each host.
kube-state-metrics: exposes the state of Kubernetes objects (pods, deployments, nodes) for supervision.
Service Monitor / Pod Monitor: Kubernetes resources to auto-discover scrape targets.

Why Prometheus became the Kubernetes standard

Native service discovery. Prometheus integrates natively with Kubernetes via ServiceMonitor: declaring a new application to monitor is just creating a YAML resource, no changes to the Prometheus server configuration. This dynamism is essential in environments where pods are born and die by the minute.

Open format. The OpenMetrics format (derived from the Prometheus format) is now a CNCF standard. All major bricks (Kubernetes, etcd, kube-apiserver, NGINX, PostgreSQL, RabbitMQ, etc.) expose their metrics natively in Prometheus format.

SLO/SRE coupling. Tools like Pyrra or Sloth automatically generate Prometheus rules from an SLO declaration. The Prometheus + PromQL pair is the technical foundation of any modern SRE practice.

Known limitations

Prometheus remains a monitoring system, not a logs or traces system. For those other signals, it is complemented by:

Loki or Elastic Stack for logs.
Tempo or Jaeger for distributed tracing.

On scalability, a monolithic Prometheus handles up to 1 to 2 million active series well. Beyond that, architectures like Thanos or Mimir provide long-term persistence, S3-compatible object storage and high availability.

In practice at Hidora

On client engagements, we systematically deploy the kube-prometheus-stack Helm chart with Alertmanager configured to escalate to on-call teams. SLOs are declared in YAML and generated into Prometheus rules via Sloth, which lets product teams change their reliability targets via pull request.

Related Hidora services

Managed Services: 24/7 operation of a Prometheus stack with 200+ supervised metrics per environment.
Consulting: observability strategy design, SLI selection, Grafana dashboard design.
Observability, Kubernetes, SRE: associated technical and methodological building blocks.

What is Prometheus?