categories.cloud-architecture Advanced

How do you design a multi-region high-availability architecture? What are RPO and RTO?

AI Practice

Key Metric Definitions

RTO (Recovery Time Objective): The maximum acceptable time from disaster occurrence to service restoration. Example: RTO = 4 hours means up to 4 hours of downtime is acceptable.

RPO (Recovery Point Objective): The maximum acceptable data loss window. Example: RPO = 1 hour means losing up to 1 hour of data is acceptable.

Multi-Region Architecture Patterns

Backup and Restore

  • RTO: Hours, RPO: Hours
  • Lowest cost — periodically back up data to another Region
  • Best for: Non-critical systems, low cost requirements

Pilot Light

  • RTO: Tens of minutes, RPO: Minutes
  • Maintain a minimal core infrastructure in the backup Region (database replication running); scale up quickly during disaster
  • Best for: Moderately important systems

Warm Standby

  • RTO: Minutes, RPO: Seconds
  • Backup Region maintains a scaled-down full environment that continuously receives data replication
  • Best for: Important business systems

Active-Active

  • RTO: Seconds, RPO: Near-zero
  • Both Regions serve traffic simultaneously; if one fails, the other takes over
  • Highest cost, most complex data consistency challenges
  • Best for: Global critical systems (finance, e-commerce)

Key Technical Components

  • Data replication: Cross-region async or sync replication (RDS Read Replica, DynamoDB Global Tables)
  • DNS Failover: Route 53 Health Check automatically switches traffic to the healthy Region
  • Global load balancing: CloudFront, AWS Global Accelerator for intelligent routing

✦ AI Mock Interview

Type your answer and get instant AI feedback

Sign in to use AI scoring

Copyright © 2026 Wood All Rights Reserved · FE Interview Hub