How does SRE approach capacity planning? What role does load testing play?
Goal of Capacity Planning
Ensure the system always has sufficient resources to maintain SLOs under expected and unexpected load growth, while avoiding over-provisioning that wastes money.
Capacity Planning Process
1. Build demand forecasts
- Analyze historical traffic growth trends (linear, exponential, seasonal)
- Forecast traffic for the next 3-12 months
- Account for business events (promotions, new market launches)
2. Define the resource model Build a resource utilization model for the service:
- How much CPU/memory/storage is needed per 1,000 QPS?
- Identify the bottleneck resource (usually CPU or database connections)
3. Load testing Simulate high traffic in a controlled environment to verify the system can handle expected capacity and find the breaking point.
Tools: k6, Apache JMeter, Locust, Gatling
4. Buffer and safety factor Never plan to run at 100% capacity. Typically reserve 20-30% buffer (accounting for traffic spikes, performance degradation, and capacity needs during rolling deployments).
Load Testing Types
| Type | Purpose |
|---|---|
| Baseline Test | Establish performance baseline under normal load |
| Load Test | Verify behavior under expected peak load |
| Stress Test | Find the system's breaking point (beyond capacity) |
| Soak Test | Sustained stable load over time to find memory leaks etc. |
| Spike Test | Simulate sudden traffic surges (e.g., viral events) |
Capacity Planning vs Auto-Scaling
Auto-scaling cannot replace capacity planning:
- Auto-scaling has a scale-out delay (minutes)
- Cloud resource quotas have limits (must be requested in advance)
- Cost budgets need to be planned ahead
✦ AI Mock Interview
Type your answer and get instant AI feedback
Sign in to use AI scoring
