categories.data-quality-observability Basic

What is a Data Catalog and what problems does it solve?

AI Practice

Data Catalog

A data catalog is a centralized system for managing metadata about enterprise data assets, enabling users to discover, understand, and trust their data.

Core Problems It Solves

Data silos: Data is scattered across departments, making it hard to know what data exists where.

Poor data understanding: Ambiguous column names like "status" or "type" — no one knows what they mean.

Redundant work: Different teams build the same datasets or metrics independently, wasting effort and creating inconsistent definitions.

Compliance risk: No way to track which tables contain personal or sensitive data (PII).

Key Features

Feature Description
Data discovery Search and find needed datasets or columns
Business glossary Unified definitions for business terms, eliminating ambiguity
Data lineage Visualize data flow and dependencies
Data quality scores Display quality assessment results for each dataset
Sensitive data tagging Mark locations of PII, financial, and other sensitive data

Common Tools

  • Open source: Apache Atlas, Amundsen, DataHub
  • Cloud-native: AWS Glue Data Catalog, Google Dataplex
  • Commercial: Alation, Collibra

✦ AI Mock Interview

Type your answer and get instant AI feedback

Sign in to use AI scoring

Copyright © 2026 Wood All Rights Reserved · FE Interview Hub