categories.pipeline-orchestration Intermediate

Data Quality Monitoring

AI Practice

Explain data quality monitoring methods in data pipelines.

Why Data Quality Matters

"Garbage in, garbage out." Downstream analytics, reports, and ML models depend on high-quality input. Data quality issues are often hard to detect and can silently corrupt decisions.

Common Data Quality Dimensions

  • Completeness: Are critical columns NULL?
  • Uniqueness: Are primary keys duplicated?
  • Timeliness: Does data arrive on schedule?
  • Consistency: Is data consistent across systems?
  • Validity: Are values within allowed ranges (e.g., date formats, enum values)?

Tools

dbt Tests: Declare tests in dbt models (not_null, unique, accepted_values, relationships); auto-validates after every dbt run.

Great Expectations: Advanced Python framework for complex expectations (value distributions, inter-column relationships); generates data documentation.

Monitoring Alerts: Set threshold alerts (e.g., null rate > 5%, row count drops unexpectedly) and proactively notify on anomalies.

✦ AI Mock Interview

Type your answer and get instant AI feedback

Sign in to use AI scoring

Copyright © 2026 Wood All Rights Reserved · FE Interview Hub