categories.pipeline-orchestration Intermediate

Data Pipeline Orchestration: Apache Airflow

AI Practice

Explain the role of pipeline orchestration tools, using Airflow as an example.

What Is Pipeline Orchestration

Pipelines consist of multiple interdependent tasks. Orchestration tools manage task execution order, retries, monitoring, and alerting.

Apache Airflow

Airflow is the leading open-source orchestration tool. Pipelines are defined as DAGs (Directed Acyclic Graphs) in Python code.

DAG (Directed Acyclic Graph)

Composed of task nodes and dependency edges. Ensures tasks execute in dependency order with no circular dependencies.

Operator Types

  • PythonOperator: Execute a Python function.
  • BashOperator: Execute a shell command.
  • SqlOperator: Execute a SQL query.
  • SensorOperator: Wait for an external condition (e.g., file arrives in S3).

Core Features

  • Scheduling (cron expressions)
  • Task retry on failure (retries, retry_delay)
  • Backfill: Re-run tasks for historical dates
  • Web UI for visualizing DAG run status

Modern Alternatives

Prefect, Dagster (more Pythonic, easier to test); Databricks Workflows (within the Databricks platform).

✦ AI Mock Interview

Type your answer and get instant AI feedback

Sign in to use AI scoring

Copyright © 2026 Wood All Rights Reserved · FE Interview Hub