Skip to content
Back to glossary
Operations

What is Observability?

Observability is the ability to understand a system from its outputs, metrics, logs and traces, to answer questions you didn't know you'd ask.

More than monitoring

Monitoring tells you when a known thing breaks: the disk fills up, the CPU goes red, the certificate expires. Observability is what you reach for when an unknown thing breaks, when users report slowness, dashboards look fine, and your only question is "why?"

The discipline rests on three signal types, often called the three pillars:

  1. Metrics : numerical samples over time (request rate, latency p99, queue depth). Cheap to store, fast to graph, but anonymous: a metric tells you the average user is slow, not which one.
  2. Logs : discrete event records with context (timestamp, severity, payload). Heavier to store, but they preserve the story of what each request did.
  3. Traces : distributed call graphs that follow a single request as it crosses services, queues and databases. Essential for understanding latency in a microservices stack.

A modern observability stack also captures events (deploys, config changes, feature flags) and profiles (CPU/memory samples), useful when correlating a regression with a release.

What "good" looks like

In a system we'd certify, three things hold:

  • You can trace any user complaint back to the responsible request in under five minutes. Not "by the end of the day after grepping logs."
  • The team that wrote the code has direct access to its production behaviour. No support tickets to a separate ops team to retrieve a log line.
  • Alerts wake people up only for symptoms users feel. No alert fatigue from disk usage at 60 % or CPU at 70 %.

The tooling that delivers this in 2026 is usually a combination of Prometheus + Grafana + Loki + Tempo, or commercial equivalents (Datadog, Honeycomb, New Relic), or the OpenTelemetry collector pipeline feeding any of the above.

Why it costs more than people expect

Storage. A typical microservice generates 5–50 GB of logs per day. Multiply by 30 days of retention, multiply by environments, multiply by replicas, six-figure annual bills are normal at scale. Sampling, log-level discipline and structured logging are how teams keep the cost in check without losing the ability to debug.

For Swiss regulated industries, retention also has a compliance angle: certain audit logs must be kept 7 years and stored in Switzerland, which makes Hikube, our Swiss sovereign cloud, a natural fit.

Related Hidora services

  • Managed Services : running the observability stack so your engineers can focus on writing code.
  • Consulting : designing or refreshing your observability strategy.