What RPO does
The Recovery Point Objective (RPO) is the maximum-data-loss commitment between the last valid backup point and the moment of a disaster. Expressed as a duration (5 minutes, 1 hour, 24 hours), it directly drives the backup and replication strategy.
In practice, a 1-hour RPO means: in case of total loss, the organisation accepts losing at most 60 minutes of recent data. This implies a backup or replication at least every hour. A 5-minute RPO implies near-real-time replication (synchronous or low-latency asynchronous). A 24-hour RPO allows a simple nightly backup.
How to set a realistic RPO
Like RTO, RPO is a business decision. Three questions guide the trade-off:
-
What is the value of one hour of lost data? A lost banking transaction is unacceptable. A lost internal-dashboard entry can be re-entered. RPO is calibrated by criticality.
-
What does a short RPO cost? Dividing RPO by 10 (24h to 2h) increases storage cost by 3 to 5 times; moving from 1h to 5 minutes typically multiplies infrastructure cost by 2 to 4 (synchronous replication, multi-zone redundancy).
-
What are the regulatory constraints? FINMA requires a documented RPO for Swiss financial institutions. The nFADP mandates adequate protection proportional to data sensitivity.
RPO and storage types
Typical categorisation:
- RPO < 1 second: synchronous multi-zone replication (Postgres synchronous streaming replication, Galera Cluster). Every write is confirmed only after replication. Higher latency but no loss possible.
- RPO < 5 minutes: asynchronous replication with frequent WAL shipping. Most professional DBMSs (Postgres, MySQL, MongoDB) reach this level without difficulty.
- RPO 1 hour: automated hourly snapshots to object storage (S3-compatible) with versioning. Standard on non-critical workloads.
- RPO 24 hours: traditional nightly backups. Compatible with a 30/90/365-day retention policy.
RPO in the Kubernetes context
On Kubernetes, RPO is managed at three distinct levels:
- PersistentVolumes: CSI snapshots configurable per StorageClass (Ceph, Longhorn, EBS, etc.). Frequency adjustable from 5 min to 24h.
- Managed databases: Hikube, AWS RDS, Azure Database offer Point-in-Time Recovery (PITR) with a few-seconds RPO via continuous WAL archiving.
- Kubernetes state: etcd must be backed up separately with a short RPO (15 min typical) because losing it makes the cluster inoperable.
The theoretical-RPO trap
A documented RPO that has never been validated by an actual restore is worth zero. On Hidora engagements, we regularly observe clients with 5-year backup policies who discover, during their first restore drill, that a missing configuration file breaks the procedure. The rule: test restoration at least every 3 months, document every failure, fold the fixes back into the runbook.
Related Hidora services
- SLA Expert: contractual RPO commitments with monthly measurement and compliance reporting.
- Consulting: backup-policy audit, RPO calibration by application criticality, restoration drills.
- Managed Services: backup and replication operations with RPO-coverage dashboards.
- RTO, DRP, Observability: related indicators and processes.