categories.data-quality-observability Advanced

What are the five pillars of data observability? How do you build a comprehensive monitoring system?

AI Practice

Data Observability

A framework originally proposed by Monte Carlo, applying software SRE observability concepts to data systems. The goal: engineering teams proactively detect and resolve data issues before end users notice them.

Five Pillars

1. Freshness Is data updated on time?

  • Monitor: last update time vs SLA
  • Alert: data not updated in over X hours

2. Distribution Are data value ranges and distributions normal?

  • Monitor: historical trends of min/max/mean/null rate
  • Alert: metrics outside normal bounds

3. Volume Is the number of rows within expected range?

  • Monitor: daily/hourly row count trends
  • Alert: row count below or above historical baseline

4. Schema Did the data structure change unexpectedly?

  • Monitor: column additions/removals/type changes
  • Alert: schema change events

5. Lineage Did a data issue impact downstream systems?

  • Monitor: health of cross-table dependencies
  • Alert: upstream anomalies automatically notify downstream owners

Steps to Build a Monitoring System

  1. Define SLAs: Specify freshness requirements for each critical dataset
  2. Establish baselines: Build normal range baselines from historical data
  3. Configure alerts: Define trigger conditions and notification channels (Slack, PagerDuty)
  4. Create runbooks: Standard response procedures for each anomaly type
  5. Review regularly: Adjust thresholds based on false positive/negative rates

Tool Ecosystem

  • End-to-end platforms: Monte Carlo, Bigeye, Anomalo
  • Open-source stack: Soda Core + Airflow + Grafana
  • dbt Cloud: Built-in model health monitoring

✦ AI Mock Interview

Type your answer and get instant AI feedback

Sign in to use AI scoring

Copyright © 2026 Wood All Rights Reserved · FE Interview Hub