Scaling: Vertical vs Horizontal
Bigger machine vs more machines — the first fork in the road for any growing system.
Your system is getting more traffic than one server can comfortably handle. You have two fundamental moves: make the server biggeror add more servers. The first keeps the design simple; the second spreads work and failure across replaceable machines.
The problem: one box becomes the choke point
A single server is a great starting point because everything is local: sessions in memory, temporary files on disk, background work in a local queue, and one process to deploy. The failure mode is concentration. If that machine is slow, every user is slow. If that machine dies, the service dies with it.
clients
-> app-server-1
-> local memory: sessions, in-process cache
-> local disk: temp files
-> database
<- response- CPU saturation: requests wait because all cores are busy rendering, serializing JSON, encrypting TLS, or running business logic.
- Memory pressure: local sessions, caches, queues, and request buffers compete for RAM. Latency jumps when the process swaps or garbage collection dominates.
- I/O bottlenecks: disk, network, database connections, or kernel socket limits become the narrowest pipe.
- Single point of failure: a reboot, bad deploy, hardware fault, or runaway process can take the entire application offline.
Vertical scaling: make the box bigger
Vertical scaling, or scaling up, means running the same software on a larger machine: more CPU, more RAM, faster disks, a larger network interface, or a bigger managed database instance. The topology barely changes, so it is often the fastest early win.
before: app-1 = 4 vCPU, 16 GB RAM
after: app-1 = 32 vCPU, 128 GB RAM
same app process
same endpoint
same local assumptionsWhy teams scale up first
- Low engineering cost: no load balancer, service discovery, fleet deploys, or cross-node debugging is required.
- Existing state still works: local sessions, local caches, and local temp files keep working because there is still one app server.
- Operationally quick: resizing a VM or database tier can be much faster than changing the application to be distributed.
The cost curve and ceiling
Vertical scaling is cheap and rational at small sizes, then becomes expensive near the top. The largest machines have premium pricing and eventually there is simply no larger box to buy. You can purchase more headroom, but not infinite headroom.
| Stage | Cost curve | Design signal |
|---|---|---|
| Small to medium | Usually predictable | Scale up is a sensible first move |
| Large | More expensive per unit of capacity | Measure carefully before resizing again |
| Largest practical | Very costly and capped | Prepare to scale out or reduce work |
| Failed node | Capacity becomes zero | The single point of failure remains |
Horizontal scaling: add more boxes
Horizontal scaling, or scaling out, means running multiple copies of the service and distributing requests across them. One node no longer has to carry all traffic, and one node failure should reduce capacity rather than end the service.
clients
-> load balancer
-> app-1
-> app-2
-> app-3
if app-2 fails:
health check marks it unhealthy
new requests go to app-1 and app-3- Capacity grows by adding nodes: ten app servers can handle much more traffic than one, until a shared dependency such as the database becomes the bottleneck.
- Availability improves: deploys can roll through the fleet, and failed nodes can be removed from rotation.
- Complexity increases: you now need load balancing, health checks, autoscaling, centralized logs, metrics, and safe fleet deploys.
Stateless services unlock scale-out
Horizontal scaling works best when app servers are stateless. A stateless server keeps no important per-user or per-job data in local memory between requests. Any healthy node can handle the next request because durable state lives in shared systems.
problem:
request 1 -> app-1 stores session in memory
request 2 -> app-2 cannot find the session
fix:
sessions -> Redis or signed cookies
records -> database
files -> object storage
jobs -> shared queue| Concern | Stateful app server | Stateless app server |
|---|---|---|
| Sessions | Stored in one process | Stored in Redis, DB, or signed token |
| Uploads | Temp file on one node | Object storage plus DB metadata |
| Jobs | Local in-memory queue | Shared durable queue |
| Node failure | Local users or jobs are lost | Another node can continue |
| Load balancing | May need sticky sessions | Any node can serve any request |
The usual path: vertical first, then horizontal
Real teams usually follow a gradual path. Start simple, measure the bottleneck, buy easy headroom, then remove local-state assumptions before adding a fleet. This avoids paying distributed-system complexity before the product needs it.
1. one app server + one database
2. scale up app or database when metrics show saturation
3. add a load balancer and a second app server
4. move sessions, files, caches, and jobs out of app memory
5. autoscale stateless app servers horizontally
6. scale shared dependencies when they become the new bottleneck| Dimension | Vertical: scale up | Horizontal: scale out |
|---|---|---|
| Basic move | Use a bigger machine | Use more machines |
| Engineering effort | Low; same topology | Higher; distributed operations |
| Capacity limit | Largest affordable box | Add nodes until another dependency limits you |
| Cost curve | Can get steep near the top | More linear, plus operational overhead |
| Failure model | One node can still take everything down | Node loss reduces capacity |
| Best fit | Early growth and quick relief | High traffic and fault tolerance |
Edge cases and gotchas
- The database may become the next bottleneck: after app servers scale out, shared storage often needs indexing, caching, read replicas, partitioning, or sharding.
- Sticky sessions hide state problems: pinning a user to one server can work temporarily, but that server is now special for that user and load can become uneven.
- Autoscaling is not instant: new nodes need time to boot, warm caches, pass health checks, and receive traffic.
- Local caches can disagree: each node may have a different in-memory view. Use short TTLs or a shared cache for correctness sensitive data.
- Vertical scaling means a bigger machine: simple, fast, and useful early, but capped by hardware and cost.
- A vertically scaled service can still be a single point of failure; capacity and availability are different concerns.
- Horizontal scaling means more machines behind routing; it improves capacity and resilience but adds distributed operations.
- Stateless app servers make horizontal scaling practical because any node can handle any request.
- The normal path is scale up first, externalize state, then scale out when traffic or availability requires it.
Mark it complete to track your progress through the workbook.