Caching Basics
The single highest-leverage trick in system design — plus the traps that bite beginners.
A cache is a small, fast store that keeps copies of data or computation results so the system can avoid repeating expensive work. Caching is one of the highest-leverage tools in system design: it cuts latency, reduces database load, and absorbs traffic spikes. It also creates hard correctness questions because cached data can become stale.
The problem caching solves
Real traffic is rarely uniform. A small set of hot objects receives a large fraction of reads: the home feed, a trending video, a celebrity profile, a product page during a sale, or the same configuration record every request needs. Without caching, every read repeats the slow path.
every request
-> app server
-> database query or expensive computation
-> response
if one hot key receives 50,000 requests/second:
the database sees 50,000 reads/second for the same answerThe failure mode is not only latency. Repeated reads can exhaust DB CPU, connection pools, disk I/O, or downstream API quotas. Once the source of truth is overloaded, even uncached writes and admin operations can slow down.
Where caches live
Caches appear at many layers. Each layer avoids a different amount of work, and each layer has a different owner and invalidation strategy.
| Location | What it stores | What it saves | Common gotcha |
|---|---|---|---|
| Browser | HTML, CSS, JS, images, API responses | Network round trips | Hard refresh and versioning behavior |
| CDN edge | Static assets, public pages, media | Origin bandwidth and global latency | Purging and private-data leaks |
| Application cache | Hot objects, query results, sessions, rate limits | Database and computation work | Stale values and stampedes |
| Database cache | Recently read pages and indexes | Disk reads inside the DB | Limited direct control from the app |
A full system often uses several layers at once. A product image might be cached in the browser, at the CDN, and in object-storage edge caches. A user profile might be cached in Redis, while the database also caches the index pages used to find it.
Hit ratio dominates performance
Cache hit ratio is the percentage of requests answered by the cache. It matters because the average latency is a weighted blend of the fast cache path and the slow miss path. A small hit-ratio change can dominate system behavior.
cache latency = 2 ms
database latency = 80 ms
hit ratio = 95%
average = 0.95 * 2ms + 0.05 * 80ms
= 1.9ms + 4ms
= 5.9ms
if hit ratio drops to 80%:
average = 0.80 * 2ms + 0.20 * 80ms = 17.6ms- Hot-set size: if the cache is too small to hold the data users repeatedly request, hit ratio collapses.
- Key design: overly specific keys such as including a timestamp or random token can prevent reuse.
- TTL choice: very short TTLs may create constant misses; very long TTLs can make stale data unacceptable.
Cache-aside read path
The most common application pattern is cache-aside. The app checks the cache first. On a miss, it reads the source of truth, populates the cache, and returns the value. The application owns the policy.
def get_user(user_id):
key = f"user:{user_id}"
value = cache.get(key)
if value is not None:
return value # cache hit
value = db.query_user(user_id) # cache miss
cache.set(key, value, ttl=300)
return valueWhy cache-aside is popular
- Lazy population: only data that is actually requested is cached.
- Simple recovery: if Redis restarts, misses refill the cache naturally from the database.
- Application control: the app chooses keys, TTLs, serialization, and when to invalidate.
Eviction: deciding what leaves the cache
Caches are intentionally smaller than the source of truth. When the cache is full, it needs an eviction policy: a rule for deciding which entry to remove.
| Policy | Meaning | Best fit | Gotcha |
|---|---|---|---|
| TTL | Expire after a fixed time | Bounding staleness and simple freshness | Many keys can expire together |
| LRU | Evict least recently used | Workloads where recent use predicts reuse | One-time scans can evict useful entries |
| LFU | Evict least frequently used | Stable popularity patterns | Can keep formerly popular items too long |
| Size-aware | Prefer evicting large or costly entries | Mixed object sizes | Requires accurate sizing |
Real systems often combine policies: TTL to bound staleness and LRU or LFU to manage memory. For example, Redis can evict old keys when memory is full while each key also has its own expiration time.
Write strategies and invalidation
Reads are only half of caching. The hard part is what happens after a write. You must decide whether to update the cache, delete the cache entry, or let it expire naturally.
| Strategy | Write path | Benefit | Risk |
|---|---|---|---|
| Write-through | Write cache and database synchronously | Reads immediately see the new value | Writes are slower and both systems must succeed |
| Write-back | Write cache first, flush to DB later | Very fast writes | Data loss if the cache fails before flush |
| Write-around | Write DB only; cache fills on next read | Avoids caching data that may never be read | First read after write is a miss |
| Invalidate-on-write | Write DB, then delete cache key | Simple with cache-aside | Race conditions can repopulate stale values |
def update_user(user_id, patch):
db.update_user(user_id, patch)
cache.delete(f"user:{user_id}")
# next read misses cache, reads fresh DB row, and fills cache againThundering herd and cache stampede
A cache stampede happens when many requests miss the same hot key at the same time. A common cause is synchronized expiration: a popular key reaches its TTL, thousands of requests miss together, and they all rebuild the value by hammering the database.
12:00:00 key product:123 expires
12:00:01 10,000 requests miss Redis
12:00:01 all 10,000 query the database
12:00:02 database CPU spikes, latency rises, retries begin
12:00:03 the outage amplifies itself- Single-flight: allow one request to rebuild the value while others wait for the result.
- Jittered TTLs: add randomness so many hot keys do not expire at the same instant.
- Stale-while-revalidate: serve a slightly stale value while a background refresh computes the new one.
- Pre-warming: refresh known hot keys before launch events, sales, or traffic spikes.
Edge cases and real-world examples
- Negative caching:cache "not found" briefly for missing objects so repeated bad IDs do not hit the database.
- Personalization: do not put private user data behind a public CDN key. Include the right vary headers or avoid shared caches.
- Large objects: one huge item can evict many useful small items. Track memory, not just key count.
- Hot keys: a single celebrity profile or flash-sale SKU can overload one cache shard even with a high overall hit ratio.
- Real examples: browsers cache bundles, CDNs cache images and video segments, Redis caches sessions and feeds, and databases cache index pages in memory.
- Caching stores hot data close to the caller so repeated reads avoid slow databases, APIs, or computations.
- Hit ratio dominates average latency and backend load; small drops in hit ratio can create large system-wide effects.
- Caches live at many layers: browser, CDN, application cache, and database buffer cache.
- Eviction policies such as TTL, LRU, and LFU decide what leaves when space or freshness runs out.
- Invalidation and stampede control are the hard parts: use safe write ordering, single-flight, jittered TTLs, and stale-while-revalidate when needed.
Mark it complete to track your progress through the workbook.