True Streaming vs Micro-Batch Processing
Explain the difference between true streaming and micro-batch processing.
True Streaming
Each message is processed immediately upon arrival; latency can reach milliseconds to seconds.
Examples: Apache Flink, Kafka Streams
Pros: Ultra-low latency.
Cons: Complex implementation (state management, watermarks); higher cost.
Micro-Batch
Collects data over a fixed interval (e.g., 1s, 10s) and processes it in small batches. Latency is typically seconds to minutes.
Examples: Apache Spark Structured Streaming
Pros: Relatively simple to implement (similar to batch logic); high throughput.
Cons: Higher latency than true streaming; batch boundaries can cause imprecise time calculations.
Selection Criteria
- Millisecond latency required (fraud detection, real-time recommendations): True streaming (Flink)
- Second-to-minute latency acceptable (real-time dashboards, monitoring alerts): Micro-batch (Spark Streaming)
- Existing Spark stack: Prefer Spark Structured Streaming
Latency Comparison
Batch (hours/days) > Micro-batch (seconds/minutes) > True streaming (milliseconds/seconds)
✦ AI Mock Interview
Type your answer and get instant AI feedback
Sign in to use AI scoring
