🧱Fundamentals·5 min read

Load Balancing Basics

How traffic gets spread across servers, and the algorithms that decide who handles what.

Once you have more than one server, clients need one stable place to send traffic. A load balancer is that front door: it accepts incoming connections, chooses a healthy backend, and keeps users away from servers that are overloaded, deploying, or broken.

🔭Think of it like…

A load balancer is the host stand at a busy restaurant. Guests do not choose a random table or inspect the kitchen; they talk to the host. The host knows which tables are open, which sections are overwhelmed, and which tables are unavailable because a glass just broke.

The problem: clients cannot manage your fleet

Without a load balancer, every client would need to know every server, retry failed machines, avoid overloaded ones, and learn when new servers appear. That pushes operational complexity to browsers, mobile apps, partner integrations, and old clients you cannot quickly update.

why the fleet needs a front door

bad shape:
  client -> app-1 or app-2 or app-3
  client must know which nodes exist
  client may keep calling a dead node

good shape:
  client -> load balancer -> healthy app node
  clients keep one stable DNS name

Performance: spread requests so one node does not melt while another sits idle.
Availability: stop routing to nodes that fail health checks.
Elasticity: add or remove servers without changing the public endpoint clients use.
Operational safety: drain traffic before a deploy, then return the node after it passes health checks.

Layer 4 vs Layer 7 load balancing

Load balancers operate at different layers. The layer determines what the balancer can see and therefore what decisions it can make.

Dimension	Layer 4: transport	Layer 7: application
What it sees	IP addresses, ports, TCP or UDP	HTTP method, host, path, headers, cookies
Routing style	Connection forwarding	Request-aware routing
Strength	Very fast and protocol-agnostic	Smart policies and HTTP features
Examples	TCP database proxy, game traffic, raw TLS pass-through	Route /api and /images to different pools
Trade-off	Less context	More CPU and more behavior to configure

How the mechanics differ

same request, different visibility

Layer 4 sees:
  source IP, destination IP, destination port 443, TCP state
  decision: choose a backend connection

Layer 7 sees after HTTP parsing or TLS termination:
  GET /checkout
  Host: shop.example.com
  Cookie: session=...
  decision: route checkout traffic to checkout-service

Use L4 when you need speed, simplicity, or non-HTTP traffic. Use L7 when routing depends on HTTP meaning: hostnames, paths, headers, cookies, authentication, redirects, rate limits, or canary releases.

How a load balancer chooses a backend

The algorithm is the policy for picking a server. There is no universal best choice; the right policy depends on whether requests have similar cost, whether servers have different sizes, and whether requests need affinity to cached or local state.

Algorithm	How it works	Best fit	Gotcha
Round-robin	Send request 1 to A, 2 to B, 3 to C, then repeat	Similar servers and similar request cost	Long requests can pile up unevenly
Least connections	Pick the backend with the fewest active connections	Mixed request durations, WebSockets, slow clients	Needs accurate connection accounting
Weighted	Give larger servers a larger share	Heterogeneous fleets or gradual migration	Bad weights overload weak nodes
Hashing	Hash a key such as IP, user ID, or cookie to choose a node	Sticky routing, cache locality, session affinity	Hot keys and node changes can skew load

weighted round-robin intuition

servers:
  app-a weight 1
  app-b weight 1
  app-c weight 2

schedule:
  a, b, c, c, a, b, c, c...

app-c receives about 50% of traffic because it has twice the weight

Sticky sessions are a trade-off

Hashing or cookie affinity can keep a user on the same server, which is useful for local caches or legacy in-memory sessions. Prefer making app servers stateless instead. Sticky routing can hide state problems and makes a node failure painful for the users pinned to it.

Health checks and connection draining

A load balancer is only useful if it knows which backends are safe to use. Health checks are repeated probes from the load balancer to each server. When a server fails enough checks, the balancer removes it from rotation; when it recovers, the balancer can add it back.

health-check loop

every 5 seconds:
  GET /healthz on each backend

if 2 checks fail:
  mark backend unhealthy
  stop sending new requests

if 3 checks pass:
  mark backend healthy
  slowly return traffic

Liveness: is the process alive enough to respond at all?
Readiness: is it ready to receive real traffic after boot, warmup, migrations, or dependency checks?
Connection draining: stop sending new requests to a node while allowing existing requests to finish before deploy or shutdown.

A health check can lie

A shallow endpoint that always returns 200 can mark a broken node as healthy. A deep check that calls every dependency can create cascading failure. Good readiness checks validate the local process and only the dependencies truly required to serve traffic.

The load balancer can be a single point of failure

The load balancer removes single-server risk from the app tier, but the balancer itself must not become the new fatal box. Production systems run redundant load balancers and give clients a way to reach a healthy one.

Redundancy pattern	How it works	Trade-off
Active-passive	One balancer serves traffic; a standby takes over on failure	Simple, but failover can take seconds
Active-active	Multiple balancers serve traffic at once	Better utilization, more coordination
DNS failover	DNS answers shift away from unhealthy endpoints	Easy globally, limited by DNS caching
Anycast	Same IP announced from many locations; routing finds a nearby healthy site	Excellent global failover, operationally advanced

redundant front door

users
  -> DNS name api.example.com
      -> load-balancer-a
      -> load-balancer-b
          -> app pool in zone 1
          -> app pool in zone 2

Managed cloud load balancers usually hide much of this machinery, but the design question remains: if one balancer, one zone, or one IP path fails, where does traffic go?

Real-world examples and gotchas

E-commerce: an L7 balancer can route /checkoutto a smaller protected pool while static images go through a CDN.
Chat or gaming: long-lived TCP or WebSocket connections often use least-connections so one node does not collect all slow clients.
Canary releases: weighted routing can send 1% of traffic to a new version, then 10%, then 50%, while metrics are watched.
Hot users: hashing by user ID can overload one backend if a celebrity account or giant tenant produces far more traffic than normal users.
Retries: automatic retries can multiply traffic during an outage. Pair load balancing with timeouts, budgets, and backoff.

Key takeaways

A load balancer gives clients one stable front door while distributing traffic across healthy backends.
L4 balancing is fast and transport-level; L7 balancing understands HTTP and enables path, host, header, and cookie policies.
Algorithms include round-robin, least-connections, weighted routing, and hashing or sticky sessions; each optimizes for a different workload.
Health checks and connection draining turn node failures and deploys into routine events instead of visible outages.
The load balancer must also be redundant through active-passive, active-active, DNS failover, anycast, or a managed multi-zone service.

Choose L7 when the routing decision needs HTTP context: host, path, method, header, cookie, redirects, rate limits, canaries, or TLS termination. If the balancer only needs to forward TCP or UDP quickly, L4 may be simpler.

They make one server special for a user. If that server fails, the user can lose the local session or cached state, and load may become uneven. Shared session storage keeps app servers interchangeable.

It can mark a node not ready, drain existing connections, stop sending new requests, deploy the new version, verify health checks, and then return the node to rotation without exposing users to the intermediate state.

Finished this lesson?

Mark it complete to track your progress through the workbook.