DrawLintDrawLint.ai
🗺️Design Patterns·6 min read

Hot Key / Hot Partition Mitigation

Tame viral content and celebrity traffic with replication, local caches, and single-flight.

A hot key or hot partition is a single logical item that receives far more traffic than the rest of the dataset: a viral tweet, a flash-sale product, a celebrity profile, a live scoreboard, or a single counter everyone increments. The cluster may have plenty of total capacity, but the one shard that owns that key melts.

🔭Think of it like…
Picture a stadium with 100 concession stands, but everyone wants the same limited-edition jersey from one stand. The stadium is not out of workers; demand is concentrated in the wrong place. You fix it by putting that jersey at many stands, letting cashiers share one restock request, and keeping small piles near the entrances.

The failure mode: skew beats average capacity

Distributed systems are often sized by average load: total requests per second divided by number of nodes. Hot keys break that math. If one key receives 40 percent of all traffic and the hash function assigns it to one shard, that shard becomes the bottleneck while neighbors sit idle.

a balanced hash still has one owner for a hot key
// 100 shards, uniform hash, one viral object.
owner = hash("tweet:9001") % 100

// Every request for the viral object hits the same owner.
GET tweet:9001  → shard-17
GET tweet:9001  → shard-17
GET tweet:9001  → shard-17

// The hash is working correctly; popularity is the problem.
SymptomWhat it suggestsTypical metric
One shard has high CPU while others are quietHot partition or uneven key popularityPer-shard CPU, QPS, p99 latency
Cache expires and database spikes immediatelyThundering herd on one hot keyBackend calls per cache miss window
One counter or inventory row has lock waitsWrite hot spotRow lock time, conditional write failures
First detect, then spread or collapse
Hot-key work has two halves: detect the skew precisely, then either spread requests across copies or collapse duplicate work into one backend operation.

Detecting hot keys before they page you

Detection needs per-key and per-partition visibility. Aggregate service QPS can look fine while one key is destroying tail latency. Track top keys, top partitions, cache miss bursts, lock contention, and request fan-in at each layer.

  • Top-K key sampling: log or sketch the hottest keys with approximate algorithms such as count-min sketch so observability does not become its own bottleneck.
  • Per-shard dashboards: graph QPS, CPU, memory, queue depth, p95/p99 latency, and error rate by shard or partition.
  • Cache telemetry: alert on sudden miss storms for a single key, not only on overall hit ratio.
  • Write contention: watch conditional-write retries, row lock waits, and optimistic concurrency failures for counters, inventory, and balances.
Power-law traffic is normal
Many consumer systems have a power-law distribution: a tiny fraction of keys receives a huge fraction of traffic. Design assuming skew will happen, not as an exception.

Mitigating hot reads: copies, caches, and CDN

Read-heavy hot keys are usually handled by creating more places that can answer the same request. The exact layer depends on freshness and audience: in-process cache for milliseconds, Redis replicas for shared cache, database read replicas for source-of-truth reads, or CDN edges for public static bytes.

replicate one hot key into N cache copies
// Instead of one cache key:
GET profile:celeb42

// Store N equivalent copies:
profile:celeb42:copy:0
profile:celeb42:copy:1
profile:celeb42:copy:2
profile:celeb42:copy:3

// Spread reads deterministically or randomly.
copy = hash(requestId or userId) % 4
GET profile:celeb42:copy:{copy}
TechniqueBest forTrade-off
Replicate hot key to N copiesVery high read QPS for one logical itemCopies can be briefly stale after updates
Local in-process cacheTiny TTL reads on every app serverInvalidation is approximate; memory per process
Client-side cacheMobile/web clients repeatedly showing same dataMust respect privacy, TTL, and logout behavior
CDNPublic hot images, videos, JS, downloadsOnly works for cacheable HTTP responses

Public viral content should often move all the way to the CDN. A viral image or video thumbnail should not hit Redis on every view; the edge should serve it. For data that stays inside the application, Redis is a common shared cache layer, but the hot-key principle applies to any cache or store.

Request coalescing: single-flight

Caches fail hardest when a hot key expires. Thousands of requests miss at the same moment and all try to rebuild the value from the database.Single-flight elects one request to do the rebuild while the others wait for the same promise or future.

single-flight cache miss coalescing
inflight = Map()  // key -> promise

async function getWithSingleFlight(key):
  cached = cache.get(key)
  if cached is not null:
    return cached

  if inflight.has(key):
    return await inflight.get(key)

  promise = (async () => {
    try:
      value = await database.load(key)
      cache.set(key, value, ttlWithJitter())
      return value
    finally:
      inflight.delete(key)
  })()

  inflight.set(key, promise)
  return await promise

Make the herd smaller before it starts

  • TTL jitter: add randomness so many hot keys do not expire in the same second.
  • Stale-while-revalidate: serve a slightly stale value while one background refresh updates the cache.
  • Negative caching: cache misses for missing objects briefly so attackers or bugs cannot hammer the database with absent keys.
Single-flight scope matters
In-process single-flight collapses work only inside one app instance. If you run 200 instances, you may still get 200 backend calls. For very hot keys, combine local single-flight with a distributed lock, Redis lease, or stale-while-revalidate strategy.

Mitigating hot writes: sharded counters and key suffixing

Writes are harder than reads because copies must converge. Counters, likes, view counts, inventory reservations, and rate limits can all hot spot on one row or key. The usual trick is to split one logical write target into many physical buckets and aggregate later.

sharded counter with key suffixing
// Logical counter: likes:post:9001
// Physical counters:
likes:post:9001:shard:0
likes:post:9001:shard:1
...
likes:post:9001:shard:63

function incrementLike(postId, userId):
  shard = hash(userId) % 64
  INCR likes:post:{postId}:shard:{shard}

function readLikeCount(postId):
  total = 0
  for shard in 0..63:
    total += GET likes:post:{postId}:shard:{shard}
  return total
Hot writeMitigationGotcha
Like or view counterShard the counter and sum shardsReads are approximate or require aggregation
Flash-sale inventoryPartition reservations into buckets with a final reconciliation stepMust avoid oversell with careful leases or escrow
Rate-limit keyShard by user or request bucket, then combineEnforcement may become approximate

Key suffixing is a form of manual sharding and partitioning. It spreads one logical hot spot across many physical keys. The price is more complex reads, aggregation, and sometimes approximate answers.

Examples and design gotchas

  • Viral tweet: cache the hydrated tweet locally, replicate the shared cache key, serve media from CDN, and use sharded counters for likes and views.
  • Flash sale: product page reads should be CDN or cache served, while purchase writes need inventory buckets, queues, or reservation tokens to avoid one database row becoming a lock hotspot.
  • Celebrity profile: replicate read models and avoid rebuilding the profile from many services on every request.
  • Correctness boundaries: stale replicas are fine for view counts and profile bios, but not for account balances or final inventory decrements.
  • Automatic splitting is not magic: some databases split hot partitions, but a single logical key may still be serialized by locks, quorum writes, or leader ownership.
Key takeaways
  • A hot key concentrates traffic on one logical item or partition, so average cluster capacity becomes misleading.
  • Detect hot keys with top-key sampling, per-shard metrics, cache-miss telemetry, and write-contention signals.
  • Hot reads can be spread with replicated cache keys, local caches, client caches, read replicas, and CDN edges.
  • Single-flight collapses many simultaneous cache misses into one backend load; pair it with TTL jitter and stale-while-revalidate.
  • Hot writes often need sharded counters or key suffixing, trading simple reads for distributed write capacity.
Hashing assigns a single logical key to one owner. If that key receives a huge fraction of traffic, the owner shard saturates even though other shards have spare capacity. The bottleneck is skew, not total cluster size.
It lets one request rebuild the missing value while concurrent requests wait for that same result. That turns thousands of identical database calls into one call, then all waiters receive the refreshed value.
A single counter key serializes all increments on one partition. Suffixing splits increments across many physical keys, increasing write throughput. Reads then sum the shards or use an asynchronously aggregated total.
Finished this lesson?

Mark it complete to track your progress through the workbook.