Data Warehouse vs Data Lake vs Data Lakehouse
Compare data warehouses, data lakes, and data lakehouses.
Data Warehouse
Stores cleaned, structured data optimized for analytical queries (OLAP).
Examples: Snowflake, BigQuery, Redshift
Pros: High query performance, strong governance. Cons: No unstructured data support, high cost, less flexible.
Data Lake
Stores all data in raw format (Parquet, CSV, JSON, video) with Schema-on-Read (structure defined at query time).
Examples: S3 + Athena, Azure Data Lake Storage
Pros: Cheap storage, preserves all raw data. Cons: Becomes a "data swamp," hard to govern, slow queries.
Data Lakehouse
Combines both: stores data in low-cost object storage (S3) with an added layer providing ACID transactions, schema management, and performance optimization.
Examples: Delta Lake (Databricks), Apache Iceberg, Apache Hudi
Features: ACID transactions, Time Travel (query historical versions), Schema Evolution — at data lake cost.
✦ AI Mock Interview
Type your answer and get instant AI feedback
Sign in to use AI scoring
