categories.reliability-sre Advanced

How does SRE approach capacity planning? What role does load testing play?

AI Practice

Goal of Capacity Planning

Ensure the system always has sufficient resources to maintain SLOs under expected and unexpected load growth, while avoiding over-provisioning that wastes money.

Capacity Planning Process

1. Build demand forecasts

  • Analyze historical traffic growth trends (linear, exponential, seasonal)
  • Forecast traffic for the next 3-12 months
  • Account for business events (promotions, new market launches)

2. Define the resource model Build a resource utilization model for the service:

  • How much CPU/memory/storage is needed per 1,000 QPS?
  • Identify the bottleneck resource (usually CPU or database connections)

3. Load testing Simulate high traffic in a controlled environment to verify the system can handle expected capacity and find the breaking point.

Tools: k6, Apache JMeter, Locust, Gatling

4. Buffer and safety factor Never plan to run at 100% capacity. Typically reserve 20-30% buffer (accounting for traffic spikes, performance degradation, and capacity needs during rolling deployments).

Load Testing Types

Type Purpose
Baseline Test Establish performance baseline under normal load
Load Test Verify behavior under expected peak load
Stress Test Find the system's breaking point (beyond capacity)
Soak Test Sustained stable load over time to find memory leaks etc.
Spike Test Simulate sudden traffic surges (e.g., viral events)

Capacity Planning vs Auto-Scaling

Auto-scaling cannot replace capacity planning:

  • Auto-scaling has a scale-out delay (minutes)
  • Cloud resource quotas have limits (must be requested in advance)
  • Cost budgets need to be planned ahead

✦ AI Mock Interview

Type your answer and get instant AI feedback

Sign in to use AI scoring

Copyright © 2026 Wood All Rights Reserved · FE Interview Hub