Consistent Hashing
Map keys to nodes so that adding/removing a node only remaps a small fraction of keys.
Consistent hashing is a placement strategy for stateful clusters: caches, storage shards, CDN routing tables, and message partitions. It maps keys to nodes so adding or removing capacity moves only a small, predictable slice of data instead of forcing the entire cluster to reshuffle at once.
The problem: modulo hashing remaps everything
The first idea most engineers reach for is hash(key) % N: hash the key, divide by the number of nodes, and use the remainder as the node index. It is simple, fast, and works beautifully while N stays constant. The failure mode appears the moment you add or remove a node.
// Three cache nodes: A, B, C
owner(key) = hash(key) % 3
// Add one more cache node: A, B, C, D
owner(key) = hash(key) % 4
// A key whose hash is 42:
42 % 3 = 0 // A
42 % 4 = 2 // C
// A key whose hash is 43:
43 % 3 = 1 // B
43 % 4 = 3 // D
// Most keys change owners, so the cluster behaves like a cold cache.For random hashes, changing from N to N + 1 changes the remainder for almost every key. In a cache, that means a huge miss storm against the database. In a storage system, it means an expensive data migration. In a CDN routing layer, it can throw users at different edge nodes and destroy locality.
| Approach | When nodes change | Operational failure mode |
|---|---|---|
| Modulo hashing | Almost every key can map to a different node | Cache stampede, huge rebalance, high tail latency |
| Consistent hashing | Only the keys in the changed ring slice move | Small controlled migration; most keys stay warm |
N, move about 1/N of keys, not nearly all keys. That is the central promise of consistent hashing.The hash ring: keys walk clockwise
Consistent hashing treats the hash space as a circle, often from 0 to 2^32 - 1 or 2^64 - 1. Hash each physical node onto the ring. Hash each key onto the same ring. A key belongs to the first node encountered while walking clockwise from the key position.
ring = [
(10, "cache-a"),
(35, "cache-b"),
(72, "cache-c"),
(91, "cache-d"),
]
function ownerFor(key):
point = hash(key) % 100
for (nodePoint, nodeName) in ring sorted by nodePoint:
if point <= nodePoint:
return nodeName
// Wrap around to the first node on the ring.
return ring[0].nodeName
// hash("user:42") = 40 => first node clockwise is cache-c at 72
// hash("post:9") = 96 => wrap around to cache-a at 10Adding a node inserts a new point on the ring. The new node only takes ownership of keys between the previous node and itself. Removing a node gives its interval to the next node clockwise. That locality is what prevents whole-cluster reshuffles.
Why only a fraction moves
If the ring is reasonably balanced and there are N equal nodes, each node owns about 1/N of the hash space. Add one more node and it should take about 1/(N + 1) of the key space. In practical terms, if a 10-node cache cluster adds one node, you want roughly 9 percent to 10 percent of keys to move, not 90 percent.
Virtual nodes: balance the ring
Hashing one point per physical node is risky. Random placement can give one node a huge arc and another node a tiny arc, which means uneven memory, uneven CPU, and uneven request volume. Production systems solve this with virtual nodes, also called tokens or vnodes.
Instead of placing cache-a once, place cache-a#0, cache-a#1, cache-a#2, and so on. Each virtual point maps back to the same physical machine. With hundreds of points per machine, random gaps average out and load becomes much smoother.
| Design | Benefit | Cost |
|---|---|---|
| One token per node | Very simple lookup table | Poor balance if node positions are unlucky |
| Many virtual nodes per node | Smoother distribution and easier weighted capacity | Larger ring metadata and slightly more lookup work |
| Weighted virtual nodes | Bigger machines can own more ring slices | Requires capacity-aware token assignment |
Replication: continue along the ring
A single owner is not enough for durable storage or highly available reads. The standard ring trick is to choose the primary owner, then keep walking clockwise to choose replicas on distinct physical nodes. With a replication factor of three, each key is stored on three successive nodes in ring order.
function replicasFor(key, replicationFactor):
point = hash(key)
replicas = []
for vnode in ring.clockwiseFrom(point):
node = vnode.physicalNode
if node not in replicas:
replicas.append(node)
if len(replicas) == replicationFactor:
return replicas
// key k maps to vnode on node B
// replicas might be [B, D, A] after skipping duplicate vnodes for B- Failure tolerance: if one node dies, reads can fall back to another replica while repair or re-replication runs.
- Placement awareness: real systems avoid putting all replicas in the same rack, availability zone, or power domain.
- Read and write quorums: Dynamo-style systems can require acknowledgements from
Rread replicas andWwrite replicas, trading latency for consistency.
Bounded-load consistent hashing
Basic consistent hashing balances key count in expectation, but it does not guarantee request load. A few very popular keys can make one node hot even if the ring is balanced. Bounded-load variants add a guardrail: a node may receive assignments only up to some limit above the average load. If it is already too full, the lookup continues to the next acceptable node.
maxLoad = averageLoad * 1.25
function boundedOwnerFor(key):
point = hash(key)
for vnode in ring.clockwiseFrom(point):
node = vnode.physicalNode
if currentLoad(node) < maxLoad:
return node
// In overload, either relax the bound or shed traffic explicitly.
return leastLoadedNode()This is useful for request routing, CDN edge assignment, and cache clients where the system can observe load in real time. It is less straightforward for durable storage because moving ownership based on momentary load can fight replication, repair, and consistency rules.
Edge cases and design gotchas
- Hash quality matters: use a stable, uniform hash. A poor hash creates clumps and defeats the balance assumptions.
- Ring metadata must be consistent: clients need the same view of nodes and tokens, or different clients will route the same key to different owners during deployments.
- Membership changes should be staged: adding 100 nodes at once still moves a lot of data. Throttle rebalancing so background migration does not compete with foreground traffic.
- Hot keys still exist: consistent hashing spreads keys, not popularity. Pair it with caching, replication, and hot-key mitigation.
- Partitioning is broader: the ring is one way to shard. For range shards, directory-based shards, and split strategies, see sharding and partitioning.
- Modulo hashing is fragile for stateful clusters because changing N remaps almost every key.
- Consistent hashing places keys and nodes on a ring; a key walks clockwise to its owner.
- Adding or removing one node remaps only the affected ring slice, roughly K/N keys rather than nearly all K keys.
- Virtual nodes smooth balance, support weighted capacity, and reduce unlucky ring gaps.
- Replication walks farther around the ring; bounded-load variants add live load guardrails for routing-heavy systems.
Mark it complete to track your progress through the workbook.