RPO (Recovery Point Objective): definition and best practices

What RPO does

The Recovery Point Objective (RPO) is the maximum-data-loss commitment between the last valid backup point and the moment of a disaster. Expressed as a duration (5 minutes, 1 hour, 24 hours), it directly drives the backup and replication strategy.

In practice, a 1-hour RPO means: in case of total loss, the organisation accepts losing at most 60 minutes of recent data. This implies a backup or replication at least every hour. A 5-minute RPO implies near-real-time replication (synchronous or low-latency asynchronous). A 24-hour RPO allows a simple nightly backup.

How to set a realistic RPO

Like RTO, RPO is a business decision. Three questions guide the trade-off:

What is the value of one hour of lost data? A lost banking transaction is unacceptable. A lost internal-dashboard entry can be re-entered. RPO is calibrated by criticality.
What does a short RPO cost? Dividing RPO by 10 (24h to 2h) increases storage cost by 3 to 5 times; moving from 1h to 5 minutes typically multiplies infrastructure cost by 2 to 4 (synchronous replication, multi-zone redundancy).
What are the regulatory constraints? FINMA requires a documented RPO for Swiss financial institutions. The nFADP mandates adequate protection proportional to data sensitivity.

RPO and storage types

Typical categorisation:

RPO < 1 second: synchronous multi-zone replication (Postgres synchronous streaming replication, Galera Cluster). Every write is confirmed only after replication. Higher latency but no loss possible.
RPO < 5 minutes: asynchronous replication with frequent WAL shipping. Most professional DBMSs (Postgres, MySQL, MongoDB) reach this level without difficulty.
RPO 1 hour: automated hourly snapshots to object storage (S3-compatible) with versioning. Standard on non-critical workloads.
RPO 24 hours: traditional nightly backups. Compatible with a 30/90/365-day retention policy.

RPO in the Kubernetes context

On Kubernetes, RPO is managed at three distinct levels:

PersistentVolumes: CSI snapshots configurable per StorageClass (Ceph, Longhorn, EBS, etc.). Frequency adjustable from 5 min to 24h.
Managed databases: Hikube, AWS RDS, Azure Database offer Point-in-Time Recovery (PITR) with a few-seconds RPO via continuous WAL archiving.
Kubernetes state: etcd must be backed up separately with a short RPO (15 min typical) because losing it makes the cluster inoperable.

The theoretical-RPO trap

A documented RPO that has never been validated by an actual restore is worth zero. On Hidora engagements, we regularly observe clients with 5-year backup policies who discover, during their first restore drill, that a missing configuration file breaks the procedure. The rule: test restoration at least every 3 months, document every failure, fold the fixes back into the runbook.

Related Hidora services

SLA Expert: contractual RPO commitments with monthly measurement and compliance reporting.
Consulting: backup-policy audit, RPO calibration by application criticality, restoration drills.
Managed Services: backup and replication operations with RPO-coverage dashboards.
RTO, DRP, Observability: related indicators and processes.

What is RPO (Recovery Point Objective)?