categories.data-quality-observability Intermediate

How do you implement anomaly detection in a data pipeline?

AI Practice

Types of Data Anomalies

Volume anomalies: Sudden increase or decrease in row counts (e.g., daily orders dropping from 10,000 to 100)

Distribution anomalies: Metric distributions deviating from historical patterns (e.g., average order value doubling overnight)

Freshness anomalies: Data updates delayed beyond expected SLA

Schema anomalies: Column type changes, new columns appearing, or columns disappearing

Detection Methods

Rule-based Set static thresholds — simple and transparent:

  • Alert when row count < 1,000
  • Alert when NULL rate > 5%

Statistical Based on historical data characteristics:

  • Z-score: detect deviation from the mean
  • IQR (Interquartile Range): detect outliers
  • Moving averages: detect trend anomalies

ML-based Machine learning models that automatically learn normal patterns — used by tools like Monte Carlo Data and Anomalo.

Tool Comparison

Tool Characteristics
dbt tests Lightweight, good for SQL rule checks
Great Expectations Rich expectation library, CI/CD support
Monte Carlo SaaS, ML-driven automatic anomaly detection
Soda Core Open-source, declarative SodaCL syntax

Best Practice

Place quality gates at key checkpoints throughout the pipeline to prevent anomalous data from flowing downstream — avoiding the "garbage in, garbage out" problem.

✦ AI Mock Interview

Type your answer and get instant AI feedback

Sign in to use AI scoring

Copyright © 2026 Wood All Rights Reserved · FE Interview Hub