Apache Spark Core Architecture

Question

Accepted Answer

Explain Apache Spark core architecture and execution model. Components Driver: The main process of a Spark application. Parses the DAG, schedules tasks, and communicates with the Cluster Manager. Executor: Process running on worker nodes that performs actual computation and stores data (memory/disk). Each application has its own executors. Cluster Manager: Manages cluster resources (YARN, Kubernetes, Spark Standalone). Core Abstractions RDD (Resilient Distributed Dataset): Lowest-level distribu…

Apache Spark Core Architecture

Components

Core Abstractions

Lazy Evaluation

Shuffle