categories.observability Intermediate

What are SLI, SLO, and SLA? How do you define them in practice?

AI Practice

Definitions

SLI (Service Level Indicator) A specific quantitative metric that measures service quality.

Common SLI types:

  • Availability: Successful requests / Total requests
  • Latency: Proportion of requests where P99 response time < 200ms
  • Error rate: 5xx errors / Total requests
  • Throughput: Requests processed per second

SLO (Service Level Objective) The target value for an SLI over a specific time window. This is an internal target.

Examples:

  • Over the past 30 days, availability SLI >= 99.9%
  • Over the past 7 days, proportion with P99 latency < 200ms >= 95%

SLA (Service Level Agreement) A formal commitment made to customers, including compensation clauses for violations. SLAs are typically looser than SLOs (leaving a buffer).

Relationship

SLI (measurement) → SLO (internal target) → SLA (external commitment)

Why SLO is looser than raw SLI: Systems can't be measured perfectly; measurement error exists. Why SLA is looser than SLO: SLO is an internal target — there should be room to fix issues before violating the SLA.

Error Budget

Error Budget = 1 - SLO

Example: If SLO = 99.9%, then monthly Error Budget = 0.1% × 43,200 minutes ≈ 43 minutes of downtime.

Error Budget is the core SRE tool: if the budget is healthy, you can accelerate feature releases; if the budget is exhausted, freeze releases and prioritize reliability improvements.

✦ AI Mock Interview

Type your answer and get instant AI feedback

Sign in to use AI scoring

Copyright © 2026 Wood All Rights Reserved · FE Interview Hub