Load Balancing Basics
How traffic gets spread across servers, and the algorithms that decide who handles what.
Once you have more than one server, clients need one stable place to send traffic. A load balancer is that front door: it accepts incoming connections, chooses a healthy backend, and keeps users away from servers that are overloaded, deploying, or broken.
The problem: clients cannot manage your fleet
Without a load balancer, every client would need to know every server, retry failed machines, avoid overloaded ones, and learn when new servers appear. That pushes operational complexity to browsers, mobile apps, partner integrations, and old clients you cannot quickly update.
bad shape:
client -> app-1 or app-2 or app-3
client must know which nodes exist
client may keep calling a dead node
good shape:
client -> load balancer -> healthy app node
clients keep one stable DNS name- Performance: spread requests so one node does not melt while another sits idle.
- Availability: stop routing to nodes that fail health checks.
- Elasticity: add or remove servers without changing the public endpoint clients use.
- Operational safety: drain traffic before a deploy, then return the node after it passes health checks.
Layer 4 vs Layer 7 load balancing
Load balancers operate at different layers. The layer determines what the balancer can see and therefore what decisions it can make.
| Dimension | Layer 4: transport | Layer 7: application |
|---|---|---|
| What it sees | IP addresses, ports, TCP or UDP | HTTP method, host, path, headers, cookies |
| Routing style | Connection forwarding | Request-aware routing |
| Strength | Very fast and protocol-agnostic | Smart policies and HTTP features |
| Examples | TCP database proxy, game traffic, raw TLS pass-through | Route /api and /images to different pools |
| Trade-off | Less context | More CPU and more behavior to configure |
How the mechanics differ
Layer 4 sees:
source IP, destination IP, destination port 443, TCP state
decision: choose a backend connection
Layer 7 sees after HTTP parsing or TLS termination:
GET /checkout
Host: shop.example.com
Cookie: session=...
decision: route checkout traffic to checkout-serviceUse L4 when you need speed, simplicity, or non-HTTP traffic. Use L7 when routing depends on HTTP meaning: hostnames, paths, headers, cookies, authentication, redirects, rate limits, or canary releases.
How a load balancer chooses a backend
The algorithm is the policy for picking a server. There is no universal best choice; the right policy depends on whether requests have similar cost, whether servers have different sizes, and whether requests need affinity to cached or local state.
| Algorithm | How it works | Best fit | Gotcha |
|---|---|---|---|
| Round-robin | Send request 1 to A, 2 to B, 3 to C, then repeat | Similar servers and similar request cost | Long requests can pile up unevenly |
| Least connections | Pick the backend with the fewest active connections | Mixed request durations, WebSockets, slow clients | Needs accurate connection accounting |
| Weighted | Give larger servers a larger share | Heterogeneous fleets or gradual migration | Bad weights overload weak nodes |
| Hashing | Hash a key such as IP, user ID, or cookie to choose a node | Sticky routing, cache locality, session affinity | Hot keys and node changes can skew load |
servers:
app-a weight 1
app-b weight 1
app-c weight 2
schedule:
a, b, c, c, a, b, c, c...
app-c receives about 50% of traffic because it has twice the weightHealth checks and connection draining
A load balancer is only useful if it knows which backends are safe to use. Health checks are repeated probes from the load balancer to each server. When a server fails enough checks, the balancer removes it from rotation; when it recovers, the balancer can add it back.
every 5 seconds:
GET /healthz on each backend
if 2 checks fail:
mark backend unhealthy
stop sending new requests
if 3 checks pass:
mark backend healthy
slowly return traffic- Liveness: is the process alive enough to respond at all?
- Readiness: is it ready to receive real traffic after boot, warmup, migrations, or dependency checks?
- Connection draining: stop sending new requests to a node while allowing existing requests to finish before deploy or shutdown.
200 can mark a broken node as healthy. A deep check that calls every dependency can create cascading failure. Good readiness checks validate the local process and only the dependencies truly required to serve traffic.The load balancer can be a single point of failure
The load balancer removes single-server risk from the app tier, but the balancer itself must not become the new fatal box. Production systems run redundant load balancers and give clients a way to reach a healthy one.
| Redundancy pattern | How it works | Trade-off |
|---|---|---|
| Active-passive | One balancer serves traffic; a standby takes over on failure | Simple, but failover can take seconds |
| Active-active | Multiple balancers serve traffic at once | Better utilization, more coordination |
| DNS failover | DNS answers shift away from unhealthy endpoints | Easy globally, limited by DNS caching |
| Anycast | Same IP announced from many locations; routing finds a nearby healthy site | Excellent global failover, operationally advanced |
users
-> DNS name api.example.com
-> load-balancer-a
-> load-balancer-b
-> app pool in zone 1
-> app pool in zone 2Managed cloud load balancers usually hide much of this machinery, but the design question remains: if one balancer, one zone, or one IP path fails, where does traffic go?
Real-world examples and gotchas
- E-commerce: an L7 balancer can route
/checkoutto a smaller protected pool while static images go through a CDN. - Chat or gaming: long-lived TCP or WebSocket connections often use least-connections so one node does not collect all slow clients.
- Canary releases: weighted routing can send 1% of traffic to a new version, then 10%, then 50%, while metrics are watched.
- Hot users: hashing by user ID can overload one backend if a celebrity account or giant tenant produces far more traffic than normal users.
- Retries: automatic retries can multiply traffic during an outage. Pair load balancing with timeouts, budgets, and backoff.
- A load balancer gives clients one stable front door while distributing traffic across healthy backends.
- L4 balancing is fast and transport-level; L7 balancing understands HTTP and enables path, host, header, and cookie policies.
- Algorithms include round-robin, least-connections, weighted routing, and hashing or sticky sessions; each optimizes for a different workload.
- Health checks and connection draining turn node failures and deploys into routine events instead of visible outages.
- The load balancer must also be redundant through active-passive, active-active, DNS failover, anycast, or a managed multi-zone service.
Mark it complete to track your progress through the workbook.