Skip to content
Back to glossary
Business Continuity

What is RTO (Recovery Time Objective)?

RTO is the maximum acceptable duration of a service outage after a disaster. The central indicator of any business-continuity plan.

What RTO does

The Recovery Time Objective (RTO) is the maximum-duration commitment between a major incident (datacenter loss, ransomware attack, database corruption) and full service restoration. It is a business metric, not a technical one: it reflects the downtime cost acceptable to the organisation, not the performance of the tools.

In practice, a 4-hour RTO means: if a disaster strikes at 9:00, the service must be operational again for users by 13:00 at the latest. RTO includes everything: incident detection, decision to activate the recovery plan, data restoration, application restart, functional validation, communication.

How to set a realistic RTO

RTO is decided jointly by business leadership and the IT team. The usual method follows three steps:

  1. Business Impact Analysis (BIA). Quantify the hourly outage cost for each critical application. An e-commerce platform doing 50,000 CHF/hour cannot tolerate a 24-hour RTO; an internal HR tool can.

  2. Technical assessment. Measure the achievable RTO with the current architecture. This means real restoration drills (not just theoretical ones), ideally quarterly.

  3. Investment. Lowering RTO costs money: real-time replication, hot secondary sites, automation. Halving an RTO typically multiplies infrastructure costs by 1.5 to 3.

Practical RTO tiers

Typical categorisation observed on Hidora engagements:

  • RTO < 15 minutes: active-active multi-region architecture, automatic failover. High cost, justified for banks, critical e-commerce, telemedicine.
  • RTO 1 to 4 hours: hot secondary sites, semi-automated restoration. Standard for most Swiss SMEs with significant online activity.
  • RTO 4 to 24 hours: off-site backups, scheduled manual restoration. Suited to non-real-time applications (BI, archiving, batch).
  • RTO > 24 hours: off-site backups, ad-hoc restoration process. Acceptable for non-critical internal tools.

RTO and cloud-native architectures

On Kubernetes, RTO drops drastically if the application is designed stateless. Configuration is in GitOps, images are in a replicated registry, databases use synchronous replication: restoring a full environment on a standby cluster takes 10 to 30 minutes through automated deployment.

Conversely, stateful workloads with complex dependencies (Postgres clusters with custom replication, NFS shared files, Kafka queues with long retention) keep a high RTO. The architectural work consists of progressively isolating those components and applying suitable replication strategies.

RTO vs RPO

RTO measures outage duration; RPO (Recovery Point Objective) measures acceptable data loss. An SME can tolerate 4 hours of unavailability (RTO) but no more than 5 minutes of lost data (RPO). The two indicators are independent and addressed separately.

Related Hidora services

  • SLA Expert: contractual RTO commitments on P1 incidents with automatic recovery-plan activation.
  • Consulting: BIA audit, disaster-recovery plan design, quarterly restoration drills.
  • Managed Services: operational execution of the recovery plan with monthly drill reports.
  • RPO, DRP, SLA: related indicators and processes.