DrawLintDrawLint.ai
🧱Fundamentals·5 min read

Scaling: Vertical vs Horizontal

Bigger machine vs more machines — the first fork in the road for any growing system.

Your system is getting more traffic than one server can comfortably handle. You have two fundamental moves: make the server biggeror add more servers. The first keeps the design simple; the second spreads work and failure across replaceable machines.

🔭Think of it like…
Vertical scaling is upgrading one restaurant kitchen with more ovens, more prep space, and faster equipment. Horizontal scaling is opening several kitchens and sending each order to whichever kitchen is ready. More kitchens create far more total capacity, but now recipes, inventory, and order state must live somewhere shared.

The problem: one box becomes the choke point

A single server is a great starting point because everything is local: sessions in memory, temporary files on disk, background work in a local queue, and one process to deploy. The failure mode is concentration. If that machine is slow, every user is slow. If that machine dies, the service dies with it.

single-server shape
clients
  -> app-server-1
      -> local memory: sessions, in-process cache
      -> local disk: temp files
      -> database
  <- response
  • CPU saturation: requests wait because all cores are busy rendering, serializing JSON, encrypting TLS, or running business logic.
  • Memory pressure: local sessions, caches, queues, and request buffers compete for RAM. Latency jumps when the process swaps or garbage collection dominates.
  • I/O bottlenecks: disk, network, database connections, or kernel socket limits become the narrowest pipe.
  • Single point of failure: a reboot, bad deploy, hardware fault, or runaway process can take the entire application offline.
Scaling is also about failure isolation
A design with more capacity but the same single fatal node is only half improved. Staff engineers ask both "Can it handle the load?" and "What disappears when this node disappears?"

Vertical scaling: make the box bigger

Vertical scaling, or scaling up, means running the same software on a larger machine: more CPU, more RAM, faster disks, a larger network interface, or a bigger managed database instance. The topology barely changes, so it is often the fastest early win.

scale up without changing the architecture
before: app-1 = 4 vCPU, 16 GB RAM
after:  app-1 = 32 vCPU, 128 GB RAM

same app process
same endpoint
same local assumptions

Why teams scale up first

  • Low engineering cost: no load balancer, service discovery, fleet deploys, or cross-node debugging is required.
  • Existing state still works: local sessions, local caches, and local temp files keep working because there is still one app server.
  • Operationally quick: resizing a VM or database tier can be much faster than changing the application to be distributed.

The cost curve and ceiling

Vertical scaling is cheap and rational at small sizes, then becomes expensive near the top. The largest machines have premium pricing and eventually there is simply no larger box to buy. You can purchase more headroom, but not infinite headroom.

StageCost curveDesign signal
Small to mediumUsually predictableScale up is a sensible first move
LargeMore expensive per unit of capacityMeasure carefully before resizing again
Largest practicalVery costly and cappedPrepare to scale out or reduce work
Failed nodeCapacity becomes zeroThe single point of failure remains

Horizontal scaling: add more boxes

Horizontal scaling, or scaling out, means running multiple copies of the service and distributing requests across them. One node no longer has to carry all traffic, and one node failure should reduce capacity rather than end the service.

scale out behind a load balancer
clients
  -> load balancer
      -> app-1
      -> app-2
      -> app-3

if app-2 fails:
  health check marks it unhealthy
  new requests go to app-1 and app-3
  • Capacity grows by adding nodes: ten app servers can handle much more traffic than one, until a shared dependency such as the database becomes the bottleneck.
  • Availability improves: deploys can roll through the fleet, and failed nodes can be removed from rotation.
  • Complexity increases: you now need load balancing, health checks, autoscaling, centralized logs, metrics, and safe fleet deploys.

Stateless services unlock scale-out

Horizontal scaling works best when app servers are stateless. A stateless server keeps no important per-user or per-job data in local memory between requests. Any healthy node can handle the next request because durable state lives in shared systems.

externalize local state
problem:
  request 1 -> app-1 stores session in memory
  request 2 -> app-2 cannot find the session

fix:
  sessions -> Redis or signed cookies
  records  -> database
  files    -> object storage
  jobs     -> shared queue
ConcernStateful app serverStateless app server
SessionsStored in one processStored in Redis, DB, or signed token
UploadsTemp file on one nodeObject storage plus DB metadata
JobsLocal in-memory queueShared durable queue
Node failureLocal users or jobs are lostAnother node can continue
Load balancingMay need sticky sessionsAny node can serve any request
Stateful systems are still necessary
Databases, queues, caches, and search indexes are intentionally stateful. The goal is to keep app servers replaceable and put important state in systems designed for persistence, replication, and recovery.

The usual path: vertical first, then horizontal

Real teams usually follow a gradual path. Start simple, measure the bottleneck, buy easy headroom, then remove local-state assumptions before adding a fleet. This avoids paying distributed-system complexity before the product needs it.

common migration path
1. one app server + one database
2. scale up app or database when metrics show saturation
3. add a load balancer and a second app server
4. move sessions, files, caches, and jobs out of app memory
5. autoscale stateless app servers horizontally
6. scale shared dependencies when they become the new bottleneck
DimensionVertical: scale upHorizontal: scale out
Basic moveUse a bigger machineUse more machines
Engineering effortLow; same topologyHigher; distributed operations
Capacity limitLargest affordable boxAdd nodes until another dependency limits you
Cost curveCan get steep near the topMore linear, plus operational overhead
Failure modelOne node can still take everything downNode loss reduces capacity
Best fitEarly growth and quick reliefHigh traffic and fault tolerance

Edge cases and gotchas

  • The database may become the next bottleneck: after app servers scale out, shared storage often needs indexing, caching, read replicas, partitioning, or sharding.
  • Sticky sessions hide state problems: pinning a user to one server can work temporarily, but that server is now special for that user and load can become uneven.
  • Autoscaling is not instant: new nodes need time to boot, warm caches, pass health checks, and receive traffic.
  • Local caches can disagree: each node may have a different in-memory view. Use short TTLs or a shared cache for correctness sensitive data.
Key takeaways
  • Vertical scaling means a bigger machine: simple, fast, and useful early, but capped by hardware and cost.
  • A vertically scaled service can still be a single point of failure; capacity and availability are different concerns.
  • Horizontal scaling means more machines behind routing; it improves capacity and resilience but adds distributed operations.
  • Stateless app servers make horizontal scaling practical because any node can handle any request.
  • The normal path is scale up first, externalize state, then scale out when traffic or availability requires it.
It buys capacity without changing the architecture. The team can keep the same code, local assumptions, and deployment model while gaining more CPU, memory, or I/O. The trade-off is that the bigger machine is still capped and still a single point of failure.
With multiple servers, consecutive requests from the same user may land on different nodes. If the session or job state only lives in one node, the next node cannot continue correctly. Shared stores make the nodes interchangeable.
The bottleneck probably moved to a shared dependency: database queries, cache misses, a queue, the network, or an external API. Scaling the app tier only helps if the app tier was the limiting resource.
Finished this lesson?

Mark it complete to track your progress through the workbook.