Sharding & Partitioning
Split one giant dataset across many machines when a single database can't keep up.
Replication copies the whole dataset to more machines.Sharding, also called horizontal partitioning, does the opposite: it splits one logical dataset into pieces and stores each piece on a different machine. You reach for it when one database can no longer hold the data or absorb the write traffic.
The problem: one primary cannot grow forever
Before sharding, squeeze the simple levers: indexes, query tuning, bigger machines, caching, and read replicas. Sharding is powerful, but it moves complexity from the database engine into your application, operations, and data model.
- Storage limit: the table, indexes, or backups no longer fit comfortably on one machine.
- Write throughput limit: replication helps reads, but every write still hits one primary.
- Maintenance limit: vacuuming, compaction, migrations, and restores become too slow on a single giant dataset.
- Blast radius: a shard failure can affect a fraction of users instead of the entire product, if the system is designed that way.
logical table: users
shard 0: users where hash(user_id) mod 4 = 0
shard 1: users where hash(user_id) mod 4 = 1
shard 2: users where hash(user_id) mod 4 = 2
shard 3: users where hash(user_id) mod 4 = 3
app router decides which physical database receives each queryRange, hash, and directory partitioning
A partitioning strategy maps a key to a shard. The strategy determines whether range scans are easy, whether load is balanced, and how painful it is to add capacity.
| Strategy | How it routes | Strength | Weakness |
|---|---|---|---|
| Range | Key ranges such as A-H, I-P, Q-Z or timestamps by month | Efficient range scans and easy human reasoning | Hot ranges form when traffic clusters at one end |
| Hash | Hash(key) maps records evenly across buckets | Excellent distribution for point lookups | Range queries scatter across shards |
| Directory | Lookup table maps each tenant/key/bucket to a shard | Flexible moves, custom placement, tenant isolation | Directory becomes critical metadata and must be highly available |
Range partitioning
Range partitioning is natural for time-series data and ordered keys. It shines when queries ask for contiguous ranges, such as logs from June. But if all writes go to "today", the newest range becomes a hotspot.
Hash partitioning
Hashing destroys order to gain even spread. It is a strong default for point lookups by user, account, or object id. The cost is that a query such as "all users created yesterday" may touch every shard unless you maintain another index or table.
Directory partitioning
A directory lets you place tenants deliberately. Big customers can get dedicated shards, small customers can share, and individual buckets can be moved during rebalancing. The directory itself must be cached, replicated, and updated safely.
Hotspots, celebrity keys, and skew
Sharding assumes load is spread. Real products love to break that assumption. A celebrity account, viral post, hot product launch, or current timestamp can concentrate most traffic on one shard even when the overall cluster has plenty of capacity.
- Hot partition: one shard receives disproportionate reads or writes and becomes the bottleneck.
- Celebrity key: one logical key is so popular that hashing cannot help because all its traffic maps to one place.
- Monotonic key: keys such as increasing ids or timestamps send new writes to the same range.
normal users:
shard = hash(user_id)
celebrity timeline reads:
shard = hash(user_id + ':' + bucket_id)
// duplicate or bucket the hot user's feed across N buckets
// readers merge buckets or hit a cached fanout resultRebalancing and resharding pain
The day you add shards, data must move. That move is calledrebalancing or resharding. It is operationally risky because the system must continue serving reads and writes while ownership changes.
1. Create new shard and update metadata with a planned move.
2. Backfill existing rows for selected buckets to the new shard.
3. Dual-write or capture changes during the backfill window.
4. Verify counts, checksums, and latest change position.
5. Flip routing for those buckets to the new shard.
6. Keep old copy briefly for rollback, then delete it.- Moving too much at once can saturate disks and networks.
- A stale router can send writes to the old owner after the move.
- Backfills must include writes that happened while the copy was running.
- Rollback plans need old data, routing metadata, and idempotent scripts.
Cross-shard queries, joins, and transactions
The easiest sharded systems make most operations single-shard. The hard cases are the ones relational databases usually handled for you: joins, uniqueness, foreign keys, and transactions across unrelated keys.
| Need | Single-shard version | Cross-shard complication | Common response |
|---|---|---|---|
| Join | Orders join users by user_id on one shard | Rows live on different databases | Denormalize, duplicate lookup data, or query service APIs |
| Transaction | Debit and credit same account shard | Two primaries must commit atomically | Avoid, use saga, or use two-phase commit carefully |
| Unique id | Unique email index on one table | Each shard sees only local rows | Central allocator, hash by email, or global uniqueness service |
| Search | Index local rows | Query all shards and merge ranking | Dedicated search index fed by change events |
- Sharding horizontally splits one logical dataset across machines to scale storage, writes, and maintenance work.
- The shard key is the central design choice: it should distribute load and keep common queries single-shard.
- Range partitioning helps range scans, hash partitioning balances point lookups, and directory partitioning gives flexible placement at metadata cost.
- Hotspots and celebrity keys can melt one shard even when average distribution looks healthy; cache, bucket, split, or isolate them.
- Cross-shard joins, transactions, uniqueness, and resharding are expensive, so shard after simpler scaling tools are exhausted.
Mark it complete to track your progress through the workbook.