categories.batch-processing Basic

MapReduce Computing Paradigm

AI Practice

Explain how MapReduce works.

What Is MapReduce

MapReduce is a distributed computing paradigm proposed by Google and the core computation model of Hadoop. It decomposes large-scale data processing into two phases: Map and Reduce.

Map Phase

Applies a mapping function independently to each input record, producing key-value pairs. Map tasks run in parallel across multiple nodes.

Example: For word count, each word emits (word, 1).

Shuffle and Sort

The framework automatically groups and sorts Map outputs by key, routing them to the appropriate Reducer. This is the most expensive step (network transfer).

Reduce Phase

Aggregates all values for the same key.

Example: Sum all (word, 1) to produce (word, total_count).

Limitations of MapReduce

Every MapReduce job reads and writes disk; multi-step jobs are slow. Spark memory-based computation solved this, gradually replacing MapReduce.

Historical Significance

MapReduce established the foundation for big data batch processing, inspiring later frameworks like Spark and Flink. Understanding MapReduce helps in understanding modern distributed computing design.

✦ AI Mock Interview

Type your answer and get instant AI feedback

Sign in to use AI scoring

Copyright © 2026 Wood All Rights Reserved · FE Interview Hub