DrawLintDrawLint.ai
🧱Fundamentals·6 min read

Caching Basics

The single highest-leverage trick in system design — plus the traps that bite beginners.

A cache is a small, fast store that keeps copies of data or computation results so the system can avoid repeating expensive work. Caching is one of the highest-leverage tools in system design: it cuts latency, reduces database load, and absorbs traffic spikes. It also creates hard correctness questions because cached data can become stale.

🔭Think of it like…
A cache is the snack bowl on your desk. The pantry is larger and more authoritative, but walking there for every handful is slow. You keep the snacks you reach for most often close by, refill the bowl when it is empty, and throw snacks away when they go stale.

The problem caching solves

Real traffic is rarely uniform. A small set of hot objects receives a large fraction of reads: the home feed, a trending video, a celebrity profile, a product page during a sale, or the same configuration record every request needs. Without caching, every read repeats the slow path.

without a cache
every request
  -> app server
  -> database query or expensive computation
  -> response

if one hot key receives 50,000 requests/second:
  the database sees 50,000 reads/second for the same answer

The failure mode is not only latency. Repeated reads can exhaust DB CPU, connection pools, disk I/O, or downstream API quotas. Once the source of truth is overloaded, even uncached writes and admin operations can slow down.

A cache is not the source of truth
Design the system so it remains correct when the cache is empty, expired, or restarted. The database or durable store must still be able to answer authoritatively, even if more slowly.

Where caches live

Caches appear at many layers. Each layer avoids a different amount of work, and each layer has a different owner and invalidation strategy.

LocationWhat it storesWhat it savesCommon gotcha
BrowserHTML, CSS, JS, images, API responsesNetwork round tripsHard refresh and versioning behavior
CDN edgeStatic assets, public pages, mediaOrigin bandwidth and global latencyPurging and private-data leaks
Application cacheHot objects, query results, sessions, rate limitsDatabase and computation workStale values and stampedes
Database cacheRecently read pages and indexesDisk reads inside the DBLimited direct control from the app

A full system often uses several layers at once. A product image might be cached in the browser, at the CDN, and in object-storage edge caches. A user profile might be cached in Redis, while the database also caches the index pages used to find it.

Hit ratio dominates performance

Cache hit ratio is the percentage of requests answered by the cache. It matters because the average latency is a weighted blend of the fast cache path and the slow miss path. A small hit-ratio change can dominate system behavior.

average latency from hit ratio
cache latency = 2 ms
database latency = 80 ms
hit ratio = 95%

average = 0.95 * 2ms + 0.05 * 80ms
        = 1.9ms + 4ms
        = 5.9ms

if hit ratio drops to 80%:
average = 0.80 * 2ms + 0.20 * 80ms = 17.6ms
  • Hot-set size: if the cache is too small to hold the data users repeatedly request, hit ratio collapses.
  • Key design: overly specific keys such as including a timestamp or random token can prevent reuse.
  • TTL choice: very short TTLs may create constant misses; very long TTLs can make stale data unacceptable.

Cache-aside read path

The most common application pattern is cache-aside. The app checks the cache first. On a miss, it reads the source of truth, populates the cache, and returns the value. The application owns the policy.

cache-aside read path
def get_user(user_id):
    key = f"user:{user_id}"

    value = cache.get(key)
    if value is not None:
        return value              # cache hit

    value = db.query_user(user_id) # cache miss
    cache.set(key, value, ttl=300)
    return value

Why cache-aside is popular

  • Lazy population: only data that is actually requested is cached.
  • Simple recovery: if Redis restarts, misses refill the cache naturally from the database.
  • Application control: the app chooses keys, TTLs, serialization, and when to invalidate.
Related lesson
This lesson introduces the core vocabulary. For larger designs, seeCaching Systems for multi-layer caches, consistency, and operational patterns.

Eviction: deciding what leaves the cache

Caches are intentionally smaller than the source of truth. When the cache is full, it needs an eviction policy: a rule for deciding which entry to remove.

PolicyMeaningBest fitGotcha
TTLExpire after a fixed timeBounding staleness and simple freshnessMany keys can expire together
LRUEvict least recently usedWorkloads where recent use predicts reuseOne-time scans can evict useful entries
LFUEvict least frequently usedStable popularity patternsCan keep formerly popular items too long
Size-awarePrefer evicting large or costly entriesMixed object sizesRequires accurate sizing

Real systems often combine policies: TTL to bound staleness and LRU or LFU to manage memory. For example, Redis can evict old keys when memory is full while each key also has its own expiration time.

Write strategies and invalidation

Reads are only half of caching. The hard part is what happens after a write. You must decide whether to update the cache, delete the cache entry, or let it expire naturally.

StrategyWrite pathBenefitRisk
Write-throughWrite cache and database synchronouslyReads immediately see the new valueWrites are slower and both systems must succeed
Write-backWrite cache first, flush to DB laterVery fast writesData loss if the cache fails before flush
Write-aroundWrite DB only; cache fills on next readAvoids caching data that may never be readFirst read after write is a miss
Invalidate-on-writeWrite DB, then delete cache keySimple with cache-asideRace conditions can repopulate stale values
invalidate after a database write
def update_user(user_id, patch):
    db.update_user(user_id, patch)
    cache.delete(f"user:{user_id}")

# next read misses cache, reads fresh DB row, and fills cache again
Invalidation is where correctness bugs live
If you update the database but forget to invalidate the cache, users can see old data until TTL expiry. If you invalidate before the database commit, another request may refill the cache with the old row. Order, retries, and idempotency matter.

Thundering herd and cache stampede

A cache stampede happens when many requests miss the same hot key at the same time. A common cause is synchronized expiration: a popular key reaches its TTL, thousands of requests miss together, and they all rebuild the value by hammering the database.

stampede failure mode
12:00:00 key product:123 expires
12:00:01 10,000 requests miss Redis
12:00:01 all 10,000 query the database
12:00:02 database CPU spikes, latency rises, retries begin
12:00:03 the outage amplifies itself
  • Single-flight: allow one request to rebuild the value while others wait for the result.
  • Jittered TTLs: add randomness so many hot keys do not expire at the same instant.
  • Stale-while-revalidate: serve a slightly stale value while a background refresh computes the new one.
  • Pre-warming: refresh known hot keys before launch events, sales, or traffic spikes.

Edge cases and real-world examples

  • Negative caching:cache "not found" briefly for missing objects so repeated bad IDs do not hit the database.
  • Personalization: do not put private user data behind a public CDN key. Include the right vary headers or avoid shared caches.
  • Large objects: one huge item can evict many useful small items. Track memory, not just key count.
  • Hot keys: a single celebrity profile or flash-sale SKU can overload one cache shard even with a high overall hit ratio.
  • Real examples: browsers cache bundles, CDNs cache images and video segments, Redis caches sessions and feeds, and databases cache index pages in memory.
Key takeaways
  • Caching stores hot data close to the caller so repeated reads avoid slow databases, APIs, or computations.
  • Hit ratio dominates average latency and backend load; small drops in hit ratio can create large system-wide effects.
  • Caches live at many layers: browser, CDN, application cache, and database buffer cache.
  • Eviction policies such as TTL, LRU, and LFU decide what leaves when space or freshness runs out.
  • Invalidation and stampede control are the hard parts: use safe write ordering, single-flight, jittered TTLs, and stale-while-revalidate when needed.
The miss path is usually much slower and more expensive than the hit path. If misses go to an 80 ms database and hits take 2 ms, moving from 95% hits to 80% hits quadruples the amount of database work and can nearly triple average latency.
The application experiences misses, reads the database, and repopulates Redis lazily. The system should remain correct because the database is the source of truth; it is just slower until the hot set warms back up.
Use single-flight so only one request rebuilds the value, add TTL jitter, serve stale data while refreshing in the background, or proactively warm hot keys before expected spikes.
Finished this lesson?

Mark it complete to track your progress through the workbook.