How does distributed tracing work? What are Span and Trace ID?
Why Distributed Tracing Is Needed
In a microservices architecture, a single user request might traverse 10 services. When a request fails or slows down, traditional logs only show fragments from each service — hard to piece together the full path. Distributed tracing solves this.
Core Concepts
Trace: Represents the end-to-end path of a complete request, identified by a unique Trace ID.
Span: A unit of work within a trace, recording:
- Operation name
- Start/end time (to calculate latency)
- Status (success/failure)
- Tags (HTTP method, DB query, etc.)
- Parent Span ID (to build tree structure)
Context Propagation: Trace ID and Parent Span ID are passed between services via HTTP headers (W3C Trace Context standard: traceparent header).
Trace Visualization
A request trace is typically displayed as a Gantt chart:
- Horizontal axis is time
- Each service's Span appears as a horizontal bar
- Clearly shows which service took the most time
OpenTelemetry
An open standard that unifies the collection API and SDK for Metrics, Logs, and Traces.
Benefits: Vendor-neutral, not tied to any specific backend (can export to Jaeger, Zipkin, Grafana Tempo)
Implementation Note
Correlation ID strategy: Generate a Trace ID when a request enters the system and include it as a structured field in all logs — so even without a dedicated tracing tool, you can search logs by Trace ID.
✦ AI Mock Interview
Type your answer and get instant AI feedback
Sign in to use AI scoring
