DrawLintDrawLint.ai
🧩Core Building Blocks·6 min read

Sharding & Partitioning

Split one giant dataset across many machines when a single database can't keep up.

Replication copies the whole dataset to more machines.Sharding, also called horizontal partitioning, does the opposite: it splits one logical dataset into pieces and stores each piece on a different machine. You reach for it when one database can no longer hold the data or absorb the write traffic.

🔭Think of it like…
Imagine a national phone book. Printing ten identical copies helps ten people read at once, but each copy is still enormous. Sharding is printing separate volumes: A through H, I through P, and Q through Z. To find "Sharma", you know which volume to open. The letter range is the shard key rule.

The problem: one primary cannot grow forever

Before sharding, squeeze the simple levers: indexes, query tuning, bigger machines, caching, and read replicas. Sharding is powerful, but it moves complexity from the database engine into your application, operations, and data model.

  • Storage limit: the table, indexes, or backups no longer fit comfortably on one machine.
  • Write throughput limit: replication helps reads, but every write still hits one primary.
  • Maintenance limit: vacuuming, compaction, migrations, and restores become too slow on a single giant dataset.
  • Blast radius: a shard failure can affect a fraction of users instead of the entire product, if the system is designed that way.
horizontal partitioning by user id
logical table: users

shard 0: users where hash(user_id) mod 4 = 0
shard 1: users where hash(user_id) mod 4 = 1
shard 2: users where hash(user_id) mod 4 = 2
shard 3: users where hash(user_id) mod 4 = 3

app router decides which physical database receives each query
Sharding changes the contract
After sharding, "query the database" becomes "route to the right shard, maybe fan out, merge results, and handle partial failures". That is why teams shard late and deliberately.

Choosing a shard key

The shard key decides where each record lives. It is the most important decision because changing it later means moving lots of data and rewriting query paths. A good key has high cardinality, spreads traffic evenly, and keeps data that is read together on the same shard.

Candidate keyDistributionLocalityProblem
user_idUsually high and evenGood for per-user dataCelebrity users can still become hot
countryLow cardinality and unevenGood for regional rulesOne country can dominate a shard
created_atEasy for time windowsGood for recent scansAll new writes hit the latest shard
conversation_idGood if many conversationsKeeps chat messages togetherHuge group chats can become hot

Design from the common query

If the main query is "load all messages in this conversation", shard by conversation_id, not by random message_id. If the main query is "load a user profile and settings", shard byuser_id. The best key turns the hot path into a single-shard request.

routing with a shard key
function shardForUser(userId) {
  const bucket = hash(userId) % 1024;        // many logical buckets
  return shardMap[bucket];                  // bucket -> physical shard
}

SELECT * FROM orders WHERE user_id = $1;    // route only to that user's shard
The worst key is the one missing from queries
If most queries do not include the shard key, the system must ask every shard and merge the answers. Fan-out queries are slower, costlier, and harder to make reliable because one slow shard slows the whole response.

Range, hash, and directory partitioning

A partitioning strategy maps a key to a shard. The strategy determines whether range scans are easy, whether load is balanced, and how painful it is to add capacity.

StrategyHow it routesStrengthWeakness
RangeKey ranges such as A-H, I-P, Q-Z or timestamps by monthEfficient range scans and easy human reasoningHot ranges form when traffic clusters at one end
HashHash(key) maps records evenly across bucketsExcellent distribution for point lookupsRange queries scatter across shards
DirectoryLookup table maps each tenant/key/bucket to a shardFlexible moves, custom placement, tenant isolationDirectory becomes critical metadata and must be highly available

Range partitioning

Range partitioning is natural for time-series data and ordered keys. It shines when queries ask for contiguous ranges, such as logs from June. But if all writes go to "today", the newest range becomes a hotspot.

Hash partitioning

Hashing destroys order to gain even spread. It is a strong default for point lookups by user, account, or object id. The cost is that a query such as "all users created yesterday" may touch every shard unless you maintain another index or table.

Directory partitioning

A directory lets you place tenants deliberately. Big customers can get dedicated shards, small customers can share, and individual buckets can be moved during rebalancing. The directory itself must be cached, replicated, and updated safely.

Related pattern
Simple modulo hashing remaps many keys when shard count changes. Consistent hashingreduces how much data moves when nodes join or leave.

Hotspots, celebrity keys, and skew

Sharding assumes load is spread. Real products love to break that assumption. A celebrity account, viral post, hot product launch, or current timestamp can concentrate most traffic on one shard even when the overall cluster has plenty of capacity.

  • Hot partition: one shard receives disproportionate reads or writes and becomes the bottleneck.
  • Celebrity key: one logical key is so popular that hashing cannot help because all its traffic maps to one place.
  • Monotonic key: keys such as increasing ids or timestamps send new writes to the same range.
splitting a celebrity key
normal users:
  shard = hash(user_id)

celebrity timeline reads:
  shard = hash(user_id + ':' + bucket_id)
  // duplicate or bucket the hot user's feed across N buckets
  // readers merge buckets or hit a cached fanout result
Mitigation toolbox
Add caching for hot reads, split hot keys into buckets, precompute fanout results, use random suffixes for write-heavy counters, and isolate very large tenants onto their own shards.

Rebalancing and resharding pain

The day you add shards, data must move. That move is calledrebalancing or resharding. It is operationally risky because the system must continue serving reads and writes while ownership changes.

online resharding shape
1. Create new shard and update metadata with a planned move.
2. Backfill existing rows for selected buckets to the new shard.
3. Dual-write or capture changes during the backfill window.
4. Verify counts, checksums, and latest change position.
5. Flip routing for those buckets to the new shard.
6. Keep old copy briefly for rollback, then delete it.
  • Moving too much at once can saturate disks and networks.
  • A stale router can send writes to the old owner after the move.
  • Backfills must include writes that happened while the copy was running.
  • Rollback plans need old data, routing metadata, and idempotent scripts.

Cross-shard queries, joins, and transactions

The easiest sharded systems make most operations single-shard. The hard cases are the ones relational databases usually handled for you: joins, uniqueness, foreign keys, and transactions across unrelated keys.

NeedSingle-shard versionCross-shard complicationCommon response
JoinOrders join users by user_id on one shardRows live on different databasesDenormalize, duplicate lookup data, or query service APIs
TransactionDebit and credit same account shardTwo primaries must commit atomicallyAvoid, use saga, or use two-phase commit carefully
Unique idUnique email index on one tableEach shard sees only local rowsCentral allocator, hash by email, or global uniqueness service
SearchIndex local rowsQuery all shards and merge rankingDedicated search index fed by change events
Two-phase commit is not magic
Distributed transactions can preserve atomicity, but they add latency, coordinator failure modes, locks held across machines, and operational complexity. Many large systems redesign workflows as sagas instead.
Key takeaways
  • Sharding horizontally splits one logical dataset across machines to scale storage, writes, and maintenance work.
  • The shard key is the central design choice: it should distribute load and keep common queries single-shard.
  • Range partitioning helps range scans, hash partitioning balances point lookups, and directory partitioning gives flexible placement at metadata cost.
  • Hotspots and celebrity keys can melt one shard even when average distribution looks healthy; cache, bucket, split, or isolate them.
  • Cross-shard joins, transactions, uniqueness, and resharding are expensive, so shard after simpler scaling tools are exhausted.
A conversation's messages would scatter across many shards, so loading one thread becomes a fan-out query plus merge. Sharding by conversation_id keeps the common read local to one shard, at the cost of handling very large or hot conversations separately.
New writes usually target the newest time range. If all events for the current minute or day land on the same shard, that shard gets almost all write traffic while older shards sit idle.
You must copy old data, capture writes that happen during the copy, keep routers from sending traffic to the wrong owner, verify correctness, and leave a rollback path. All of that happens while users still expect the system to be online.
Finished this lesson?

Mark it complete to track your progress through the workbook.