Hot Key / Hot Partition Mitigation
Tame viral content and celebrity traffic with replication, local caches, and single-flight.
A hot key or hot partition is a single logical item that receives far more traffic than the rest of the dataset: a viral tweet, a flash-sale product, a celebrity profile, a live scoreboard, or a single counter everyone increments. The cluster may have plenty of total capacity, but the one shard that owns that key melts.
The failure mode: skew beats average capacity
Distributed systems are often sized by average load: total requests per second divided by number of nodes. Hot keys break that math. If one key receives 40 percent of all traffic and the hash function assigns it to one shard, that shard becomes the bottleneck while neighbors sit idle.
// 100 shards, uniform hash, one viral object.
owner = hash("tweet:9001") % 100
// Every request for the viral object hits the same owner.
GET tweet:9001 → shard-17
GET tweet:9001 → shard-17
GET tweet:9001 → shard-17
// The hash is working correctly; popularity is the problem.| Symptom | What it suggests | Typical metric |
|---|---|---|
| One shard has high CPU while others are quiet | Hot partition or uneven key popularity | Per-shard CPU, QPS, p99 latency |
| Cache expires and database spikes immediately | Thundering herd on one hot key | Backend calls per cache miss window |
| One counter or inventory row has lock waits | Write hot spot | Row lock time, conditional write failures |
Detecting hot keys before they page you
Detection needs per-key and per-partition visibility. Aggregate service QPS can look fine while one key is destroying tail latency. Track top keys, top partitions, cache miss bursts, lock contention, and request fan-in at each layer.
- Top-K key sampling: log or sketch the hottest keys with approximate algorithms such as count-min sketch so observability does not become its own bottleneck.
- Per-shard dashboards: graph QPS, CPU, memory, queue depth, p95/p99 latency, and error rate by shard or partition.
- Cache telemetry: alert on sudden miss storms for a single key, not only on overall hit ratio.
- Write contention: watch conditional-write retries, row lock waits, and optimistic concurrency failures for counters, inventory, and balances.
Mitigating hot reads: copies, caches, and CDN
Read-heavy hot keys are usually handled by creating more places that can answer the same request. The exact layer depends on freshness and audience: in-process cache for milliseconds, Redis replicas for shared cache, database read replicas for source-of-truth reads, or CDN edges for public static bytes.
// Instead of one cache key:
GET profile:celeb42
// Store N equivalent copies:
profile:celeb42:copy:0
profile:celeb42:copy:1
profile:celeb42:copy:2
profile:celeb42:copy:3
// Spread reads deterministically or randomly.
copy = hash(requestId or userId) % 4
GET profile:celeb42:copy:{copy}| Technique | Best for | Trade-off |
|---|---|---|
| Replicate hot key to N copies | Very high read QPS for one logical item | Copies can be briefly stale after updates |
| Local in-process cache | Tiny TTL reads on every app server | Invalidation is approximate; memory per process |
| Client-side cache | Mobile/web clients repeatedly showing same data | Must respect privacy, TTL, and logout behavior |
| CDN | Public hot images, videos, JS, downloads | Only works for cacheable HTTP responses |
Public viral content should often move all the way to the CDN. A viral image or video thumbnail should not hit Redis on every view; the edge should serve it. For data that stays inside the application, Redis is a common shared cache layer, but the hot-key principle applies to any cache or store.
Request coalescing: single-flight
Caches fail hardest when a hot key expires. Thousands of requests miss at the same moment and all try to rebuild the value from the database.Single-flight elects one request to do the rebuild while the others wait for the same promise or future.
inflight = Map() // key -> promise
async function getWithSingleFlight(key):
cached = cache.get(key)
if cached is not null:
return cached
if inflight.has(key):
return await inflight.get(key)
promise = (async () => {
try:
value = await database.load(key)
cache.set(key, value, ttlWithJitter())
return value
finally:
inflight.delete(key)
})()
inflight.set(key, promise)
return await promiseMake the herd smaller before it starts
- TTL jitter: add randomness so many hot keys do not expire in the same second.
- Stale-while-revalidate: serve a slightly stale value while one background refresh updates the cache.
- Negative caching: cache misses for missing objects briefly so attackers or bugs cannot hammer the database with absent keys.
Mitigating hot writes: sharded counters and key suffixing
Writes are harder than reads because copies must converge. Counters, likes, view counts, inventory reservations, and rate limits can all hot spot on one row or key. The usual trick is to split one logical write target into many physical buckets and aggregate later.
// Logical counter: likes:post:9001
// Physical counters:
likes:post:9001:shard:0
likes:post:9001:shard:1
...
likes:post:9001:shard:63
function incrementLike(postId, userId):
shard = hash(userId) % 64
INCR likes:post:{postId}:shard:{shard}
function readLikeCount(postId):
total = 0
for shard in 0..63:
total += GET likes:post:{postId}:shard:{shard}
return total| Hot write | Mitigation | Gotcha |
|---|---|---|
| Like or view counter | Shard the counter and sum shards | Reads are approximate or require aggregation |
| Flash-sale inventory | Partition reservations into buckets with a final reconciliation step | Must avoid oversell with careful leases or escrow |
| Rate-limit key | Shard by user or request bucket, then combine | Enforcement may become approximate |
Key suffixing is a form of manual sharding and partitioning. It spreads one logical hot spot across many physical keys. The price is more complex reads, aggregation, and sometimes approximate answers.
Examples and design gotchas
- Viral tweet: cache the hydrated tweet locally, replicate the shared cache key, serve media from CDN, and use sharded counters for likes and views.
- Flash sale: product page reads should be CDN or cache served, while purchase writes need inventory buckets, queues, or reservation tokens to avoid one database row becoming a lock hotspot.
- Celebrity profile: replicate read models and avoid rebuilding the profile from many services on every request.
- Correctness boundaries: stale replicas are fine for view counts and profile bios, but not for account balances or final inventory decrements.
- Automatic splitting is not magic: some databases split hot partitions, but a single logical key may still be serialized by locks, quorum writes, or leader ownership.
- A hot key concentrates traffic on one logical item or partition, so average cluster capacity becomes misleading.
- Detect hot keys with top-key sampling, per-shard metrics, cache-miss telemetry, and write-contention signals.
- Hot reads can be spread with replicated cache keys, local caches, client caches, read replicas, and CDN edges.
- Single-flight collapses many simultaneous cache misses into one backend load; pair it with TTL jitter and stale-while-revalidate.
- Hot writes often need sharded counters or key suffixing, trading simple reads for distributed write capacity.
Mark it complete to track your progress through the workbook.