Back-of-the-Envelope Estimation
Turn 'a billion users' into QPS, storage, and server counts in under five minutes.
Back-of-the-envelope estimation turns vague scale into useful engineering numbers: requests per second, storage growth, bandwidth, and server count. The goal is not perfect precision. The goal is to land in the right order of magnitude so the architecture fits the problem before you draw boxes.
Why estimate before designing
The numbers tell you which designs are plausible. A service doing 50 requests per second can run on a small fleet. A service doing 5 million writes per second needs partitioning, queues, careful storage choices, and failure planning. If you skip the math, you may design a bicycle for highway traffic.
small scale:
10 QPS, 5 GB storage
-> one app server and one database may be fine
large scale:
500,000 QPS, 3 PB/year
-> load balancing, caching, partitioning, queues, and distributed storagePowers of two and storage units
Computers use binary-ish units, but system design interviews usually accept rounded decimal math. Memorize the ladder so you can move from item counts to bytes quickly.
| Unit | Rough size | Useful mental anchor |
|---|---|---|
| 1 KB | 1 thousand bytes | A small text message or JSON object |
| 1 MB | 1 thousand KB | A compressed image or small bundle |
| 1 GB | 1 thousand MB | 1 KB multiplied by 1 million items |
| 1 TB | 1 thousand GB | 1 KB multiplied by 1 billion items |
| 1 PB | 1 thousand TB | 1 KB multiplied by 1 trillion items |
2^10 = 1,024 ~ 1 thousand
2^20 = 1,048,576 ~ 1 million
2^30 = 1,073,741,824 ~ 1 billion
1 KiB -> 1 MiB -> 1 GiB -> 1 TiB -> 1 PiB
for quick estimates, KB -> MB -> GB -> TB -> PB by 1000xLatency numbers to keep in your head
Exact numbers vary by hardware, cloud provider, language, and workload, but the ordering is stable. Memory is much faster than disk; local calls are much faster than cross-region calls; network distance matters.
These numbers shape design choices. A browser request that calls five services serially pays network latency five times. A cache hit can save a database round trip. A cross-region synchronous write can dominate the whole request budget.
The core formulas
Most capacity estimates are built from a few reusable formulas. Keep the units visible and convert one step at a time.
requests/day = DAU * actions per user per day
average QPS = requests/day / 86,400
peak QPS = average QPS * peak factor
common peak factor: 3x to 5x for consumer systems
higher for spiky events such as sports, sales, or breaking newsstorage = items * bytes per item * replication factor * retention
write bandwidth = writes/second * bytes per write
read bandwidth = reads/second * bytes per read
server count = peak QPS / safe per-node throughput
then add headroom for failures, deploys, and uneven traffic| Estimate | Formula | Why it matters |
|---|---|---|
| QPS | DAU x actions/day / 86,400 | Sizes app servers, caches, queues, and DB reads |
| Peak QPS | average QPS x peak factor | Systems must survive peaks, not averages |
| Storage | items x size x replication x retention | Sizes databases, object stores, and backups |
| Bandwidth | QPS x response size | Sizes network links, CDN, and egress cost |
| Server count | peak QPS / per-node throughput | Turns demand into fleet size |
Fully worked example: photo sharing feed
Suppose you are designing a photo-sharing feed. Use round numbers and state assumptions before calculating.
DAU = 20 million users
feed opens = 12 per user per day
photos uploaded = 2 per user per day
average feed response = 60 KB
average stored photo after compression = 500 KB
metadata per photo = 2 KB
replication factor = 3
retention = 5 years
peak factor = 4x
one app server safely handles 800 QPSRead QPS
- Feed reads/day: 20M users x 12 opens = 240M feed reads per day.
- Average read QPS: 240M / 86,400 is about 2,800 QPS.
- Peak read QPS: 2,800 x 4 is about 11,200 QPS.
Write QPS and storage
- Uploads/day: 20M users x 2 photos = 40M photos per day.
- Average upload QPS: 40M / 86,400 is about 460 uploads per second; peak is about 1,850 uploads per second.
- Raw photo storage/day: 40M x 500 KB = 20 TB per day.
- Replicated photo storage/day: 20 TB x 3 = 60 TB per day.
- Five-year replicated photo storage: 60 TB/day x 365 x 5 is about 110 PB.
- Metadata/day: 40M x 2 KB = 80 GB raw, or 240 GB with 3x replication. Metadata is much smaller than media but still large enough to require partitioning over time.
Bandwidth and servers
- Peak feed bandwidth: 11,200 QPS x 60 KB is about 672 MB/s before compression and CDN effects.
- Peak upload bandwidth: 1,850 uploads/s x 500 KB is about 925 MB/s entering object storage.
- App server count: 11,200 peak QPS / 800 safe QPS per node = 14 nodes. Add headroom for deploys and failures, so start with roughly 20 to 25 app servers.
Gotchas and practical habits
- Average hides peaks: traffic follows time zones, notifications, launches, and special events. Always multiply by a peak factor.
- Replication and backups count: a 1 TB logical dataset may consume 3 TB replicated plus backup, index, and log overhead.
- Per-node throughput is a safe number: use measured sustainable throughput, not a perfect benchmark from an empty lab.
- Reads and writes differ: reads may be cacheable; writes often require durability, ordering, validation, and replication.
- Units prevent mistakes: write KB, MB, seconds, days, and years in every line so you do not multiply incompatible quantities.
- Back-of-the-envelope estimation converts vague scale into QPS, storage, bandwidth, and server counts.
- Memorize unit ladders and latency anchors: KB to PB, 86,400 seconds per day, and the rough cost of memory, disk, network, and cross-region calls.
- QPS comes from DAU x actions per day divided by 86,400, then multiplied by a peak factor.
- Storage comes from item count x item size x replication x retention, with extra room for indexes, backups, logs, and growth.
- Server count is peak QPS divided by safe per-node throughput, plus headroom for failures, deploys, and uneven load.
Mark it complete to track your progress through the workbook.