Database Sharding Strategies: Types and Trade-offs
Database Sharding
What Is Sharding?
Horizontally splitting data across multiple database nodes (shards); each shard stores a subset of the data.
Sharding Strategies
1. Range-based Sharding
Assign by value range (e.g., user_id 1–100000 → shard1) ✅ Efficient for range queries ❌ Can create hotspots (e.g., new IDs always hitting the latest shard)
2. Hash-based Sharding
shard = hash(key) % N
✅ Even data distribution, avoids hotspots
❌ Range queries require scanning all shards; scaling shard count requires massive data migration
3. Directory-based Sharding
Maintain a lookup table mapping keys to shards ✅ Flexible; can dynamically reassign ❌ Lookup table itself is a bottleneck; needs HA design
Challenges
- Cross-shard JOINs: Must aggregate in application layer; inefficient
- Distributed transactions: Hard to guarantee ACID; usually downgrade to eventual consistency
- Scaling: Adding shards requires resharding (data migration)
Alternatives
Consider vertical partitioning (splitting by business domain) or read replicas first; sharding is a last resort.
Interview bonus: Consistent Hashing solves hash sharding's resharding problem—Redis Cluster uses this mechanism.
✦ AI Mock Interview
Type your answer and get instant AI feedback
Sign in to use AI scoring
