DrawLintDrawLint.ai
🗺️Design Patterns·5 min read

Snowflake IDs + Base62

Generate time-ordered, globally unique IDs with no coordination, then shorten them.

Distributed systems need ids that are unique without asking one central database for permission. Snowflake IDs solve this by packing time, worker identity, and a per-millisecond sequence into one 64-bit integer. The result is globally unique, roughly time-sortable, compact, and generated locally at very high speed.

🔭Think of it like…
Imagine every factory line has a unique stamp. Each product gets stamped with the current millisecond, the line number, and the item's sequence number on that line for that millisecond. No central office hands out numbers, but the labels still sort mostly in production order.

The problem: unique ids without a single bottleneck

Auto-increment ids are wonderfully simple on one database. They become a bottleneck when writes are sharded across regions, partitions, or services. Random UUIDv4 values avoid coordination but scatter inserts across B-tree indexes and are long in URLs. Snowflake-style ids are a middle ground: local generation with enough structure to sort by time.

SchemeCoordinationSortabilitySizeCommon pain
Auto-increment integerCentral database or shard-specific allocatorPerfect within one sequenceSmallHard to scale globally; leaks row counts
UUIDv4NoneRandom128 bits / 36 charsPoor index locality and long URLs
ULID / UUIDv7NoneTime-sortable128 bitsLarger than Snowflake, but very portable
SnowflakeOnly worker id assignmentRoughly time-sortable64 bitsRequires clock-skew handling
The core idea
Put enough information in the id itself that independent machines can generate ids safely: when, which machine, and which number on that machine during that millisecond.

The 64-bit layout: timestamp | worker | sequence

The classic Twitter Snowflake layout uses 41 bits for milliseconds since a custom epoch, 10 bits for machine or worker id, and 12 bits for a sequence counter. Many companies tune the bit split, but the mechanical idea is the same.

classic Snowflake bit layout
0 | 41-bit timestamp | 10-bit worker id | 12-bit sequence
  |                 |                  |
  |                 |                  └─ 0..4095 ids per millisecond per worker
  |                 └─ 0..1023 workers
  └─ sign bit kept 0 so the id stays positive

id = ((timestampMillis - customEpoch) << 22)
   | (workerId << 12)
   | sequence
  • 41-bit timestamp: roughly 69 years of millisecond values from a custom epoch. Choosing a recent epoch keeps the number smaller for longer.
  • 10-bit worker id: 1024 unique generators. You can split this further into region bits, rack bits, and process bits.
  • 12-bit sequence: 4096 ids per millisecond per worker, or about 4 million ids per second per worker if the clock advances normally.

Generation algorithm

local id generator
generateId():
  now = currentTimeMillis()

  if now < lastTimestamp:
    handleClockMovedBackwards(lastTimestamp - now)

  if now == lastTimestamp:
    sequence = (sequence + 1) & 4095
    if sequence == 0:
      now = waitUntilNextMillis(lastTimestamp)
  else:
    sequence = 0

  lastTimestamp = now
  return ((now - epoch) << 22) | (workerId << 12) | sequence

Properties: time-sortable and globally unique

Snowflake ids are unique as long as two workers never use the same worker id at the same time and a worker never emits the same sequence for the same millisecond. Because the timestamp occupies the high bits, numeric ordering is also roughly chronological.

  • No per-id coordination: the generator does not call a database, Redis, or central service for every id. It only needs a unique worker id assignment at startup or deployment time.
  • Index locality: new ids tend to append near the end of a B-tree primary key rather than landing randomly across pages like UUIDv4.
  • Readable timing: you can decode approximate creation time from the id, which helps debugging but may be a privacy concern.
Roughly ordered does not mean causally ordered
Two regions with slightly different clocks can produce ids that sort in surprising order. Snowflake gives practical time locality, not a proof that event A happened before event B across the whole world.

Clock skew and worker id assignment

Clock movement is the sharp edge. If a worker generated ids at timestamp 10,000 and the machine clock jumps back to 9,990, reusing the same sequence range could create duplicates. Production generators must define an explicit policy.

ProblemTypical handlingTrade-off
Small backward jumpWait until the previous timestamp catches upAdds latency but preserves monotonicity
Large backward jumpRefuse to generate ids and alertAvailability hit, but avoids duplicates
Worker id collisionLease worker ids through config, Zookeeper, etcd, or Kubernetes identityOperational dependency at startup
Sequence overflow in one msBlock until next millisecondVery brief pause under extreme local burst
safe response to clock moving backward
if nowMillis < lastTimestamp:
  drift = lastTimestamp - nowMillis

  if drift <= 5:
    sleep(drift)
    nowMillis = currentTimeMillis()
  else:
    raise FatalGeneratorError("clock moved backwards; refusing ids")

Worker ids also deserve discipline. Static config is fine for a small fleet; large fleets usually allocate worker ids as leases so a restarted process cannot accidentally share an id with an old process still alive.

Base62: shortening ids for URLs

A 64-bit integer can be shown in decimal, but decimal ids are longer than necessary. Base62 uses digits, lowercase letters, and uppercase letters, so each character carries more information. It turns a large numeric id into a shorter URL-safe string without changing the underlying id.

decimal to Base62 shape
Snowflake decimal:  1992462857987428352
Base62 encoded:     2eX4r1YpQnK

alphabet = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
while id > 0:
  output.prepend(alphabet[id % 62])
  id = floor(id / 62)
  • Good for: short URLs, paste ids, invite codes, ticket ids, and public resource ids that should remain compact.
  • Not encryption: Base62 is just representation. Anyone can decode it back to the numeric id unless you add separate obfuscation.
  • Case sensitivity: Base62 relies on uppercase and lowercase being distinct. Avoid it in channels that fold case.

Gotchas and real-world examples

Snowflake-style ids show up in social posts, chat messages, orders, analytics events, and URL shorteners. They are especially useful when many write nodes need to create ids before a database write. They are not automatically the right choice for secret tokens, password resets, or anything that must be unpredictable.

  • Predictability: time-sortable ids leak approximate volume and creation time. Use random tokens for security-sensitive links.
  • JavaScript precision: 64-bit integers exceed safe integer precision in JavaScript numbers. Send them as strings to browsers.
  • Custom epoch rollover: 41 bits lasts decades, not forever. Document the epoch and migration plan.
  • Shard hints: do not overload worker bits as your only routing scheme unless you are comfortable exposing infrastructure details in public ids.
Key takeaways
  • Snowflake ids pack timestamp, worker id, and sequence into a compact 64-bit integer generated without per-id coordination.
  • The timestamp in high bits makes ids roughly time-sortable, improving database index locality compared with UUIDv4.
  • Uniqueness depends on unique worker ids and safe handling of sequence overflow and backward clock jumps.
  • Base62 shortens the decimal representation for URLs, but it is encoding, not encryption or unpredictability.
  • Use Snowflake for high-volume entities and events; use random tokens when secrecy or non-guessability matters.
Each generator owns a unique worker id and combines it with the current millisecond plus a local sequence counter. That combination is enough to avoid collisions locally and across workers without a per-id network call.
For a tiny drift it can wait until the clock catches up. For a larger jump it should stop generating ids and alert, because continuing could reuse a timestamp and sequence range that already produced ids.
JavaScript numbers cannot exactly represent every 64-bit integer. Sending the id as a string preserves the exact value for display, routing, and API calls.
Finished this lesson?

Mark it complete to track your progress through the workbook.