DrawLintDrawLint.ai
🧱Fundamentals·5 min read

Load Balancing Basics

How traffic gets spread across servers, and the algorithms that decide who handles what.

Once you have more than one server, clients need one stable place to send traffic. A load balancer is that front door: it accepts incoming connections, chooses a healthy backend, and keeps users away from servers that are overloaded, deploying, or broken.

🔭Think of it like…
A load balancer is the host stand at a busy restaurant. Guests do not choose a random table or inspect the kitchen; they talk to the host. The host knows which tables are open, which sections are overwhelmed, and which tables are unavailable because a glass just broke.

The problem: clients cannot manage your fleet

Without a load balancer, every client would need to know every server, retry failed machines, avoid overloaded ones, and learn when new servers appear. That pushes operational complexity to browsers, mobile apps, partner integrations, and old clients you cannot quickly update.

why the fleet needs a front door
bad shape:
  client -> app-1 or app-2 or app-3
  client must know which nodes exist
  client may keep calling a dead node

good shape:
  client -> load balancer -> healthy app node
  clients keep one stable DNS name
  • Performance: spread requests so one node does not melt while another sits idle.
  • Availability: stop routing to nodes that fail health checks.
  • Elasticity: add or remove servers without changing the public endpoint clients use.
  • Operational safety: drain traffic before a deploy, then return the node after it passes health checks.

Layer 4 vs Layer 7 load balancing

Load balancers operate at different layers. The layer determines what the balancer can see and therefore what decisions it can make.

DimensionLayer 4: transportLayer 7: application
What it seesIP addresses, ports, TCP or UDPHTTP method, host, path, headers, cookies
Routing styleConnection forwardingRequest-aware routing
StrengthVery fast and protocol-agnosticSmart policies and HTTP features
ExamplesTCP database proxy, game traffic, raw TLS pass-throughRoute /api and /images to different pools
Trade-offLess contextMore CPU and more behavior to configure

How the mechanics differ

same request, different visibility
Layer 4 sees:
  source IP, destination IP, destination port 443, TCP state
  decision: choose a backend connection

Layer 7 sees after HTTP parsing or TLS termination:
  GET /checkout
  Host: shop.example.com
  Cookie: session=...
  decision: route checkout traffic to checkout-service

Use L4 when you need speed, simplicity, or non-HTTP traffic. Use L7 when routing depends on HTTP meaning: hostnames, paths, headers, cookies, authentication, redirects, rate limits, or canary releases.

How a load balancer chooses a backend

The algorithm is the policy for picking a server. There is no universal best choice; the right policy depends on whether requests have similar cost, whether servers have different sizes, and whether requests need affinity to cached or local state.

AlgorithmHow it worksBest fitGotcha
Round-robinSend request 1 to A, 2 to B, 3 to C, then repeatSimilar servers and similar request costLong requests can pile up unevenly
Least connectionsPick the backend with the fewest active connectionsMixed request durations, WebSockets, slow clientsNeeds accurate connection accounting
WeightedGive larger servers a larger shareHeterogeneous fleets or gradual migrationBad weights overload weak nodes
HashingHash a key such as IP, user ID, or cookie to choose a nodeSticky routing, cache locality, session affinityHot keys and node changes can skew load
weighted round-robin intuition
servers:
  app-a weight 1
  app-b weight 1
  app-c weight 2

schedule:
  a, b, c, c, a, b, c, c...

app-c receives about 50% of traffic because it has twice the weight
Sticky sessions are a trade-off
Hashing or cookie affinity can keep a user on the same server, which is useful for local caches or legacy in-memory sessions. Prefer making app servers stateless instead. Sticky routing can hide state problems and makes a node failure painful for the users pinned to it.

Health checks and connection draining

A load balancer is only useful if it knows which backends are safe to use. Health checks are repeated probes from the load balancer to each server. When a server fails enough checks, the balancer removes it from rotation; when it recovers, the balancer can add it back.

health-check loop
every 5 seconds:
  GET /healthz on each backend

if 2 checks fail:
  mark backend unhealthy
  stop sending new requests

if 3 checks pass:
  mark backend healthy
  slowly return traffic
  • Liveness: is the process alive enough to respond at all?
  • Readiness: is it ready to receive real traffic after boot, warmup, migrations, or dependency checks?
  • Connection draining: stop sending new requests to a node while allowing existing requests to finish before deploy or shutdown.
A health check can lie
A shallow endpoint that always returns 200 can mark a broken node as healthy. A deep check that calls every dependency can create cascading failure. Good readiness checks validate the local process and only the dependencies truly required to serve traffic.

The load balancer can be a single point of failure

The load balancer removes single-server risk from the app tier, but the balancer itself must not become the new fatal box. Production systems run redundant load balancers and give clients a way to reach a healthy one.

Redundancy patternHow it worksTrade-off
Active-passiveOne balancer serves traffic; a standby takes over on failureSimple, but failover can take seconds
Active-activeMultiple balancers serve traffic at onceBetter utilization, more coordination
DNS failoverDNS answers shift away from unhealthy endpointsEasy globally, limited by DNS caching
AnycastSame IP announced from many locations; routing finds a nearby healthy siteExcellent global failover, operationally advanced
redundant front door
users
  -> DNS name api.example.com
      -> load-balancer-a
      -> load-balancer-b
          -> app pool in zone 1
          -> app pool in zone 2

Managed cloud load balancers usually hide much of this machinery, but the design question remains: if one balancer, one zone, or one IP path fails, where does traffic go?

Real-world examples and gotchas

  • E-commerce: an L7 balancer can route /checkoutto a smaller protected pool while static images go through a CDN.
  • Chat or gaming: long-lived TCP or WebSocket connections often use least-connections so one node does not collect all slow clients.
  • Canary releases: weighted routing can send 1% of traffic to a new version, then 10%, then 50%, while metrics are watched.
  • Hot users: hashing by user ID can overload one backend if a celebrity account or giant tenant produces far more traffic than normal users.
  • Retries: automatic retries can multiply traffic during an outage. Pair load balancing with timeouts, budgets, and backoff.
Key takeaways
  • A load balancer gives clients one stable front door while distributing traffic across healthy backends.
  • L4 balancing is fast and transport-level; L7 balancing understands HTTP and enables path, host, header, and cookie policies.
  • Algorithms include round-robin, least-connections, weighted routing, and hashing or sticky sessions; each optimizes for a different workload.
  • Health checks and connection draining turn node failures and deploys into routine events instead of visible outages.
  • The load balancer must also be redundant through active-passive, active-active, DNS failover, anycast, or a managed multi-zone service.
Choose L7 when the routing decision needs HTTP context: host, path, method, header, cookie, redirects, rate limits, canaries, or TLS termination. If the balancer only needs to forward TCP or UDP quickly, L4 may be simpler.
They make one server special for a user. If that server fails, the user can lose the local session or cached state, and load may become uneven. Shared session storage keeps app servers interchangeable.
It can mark a node not ready, drain existing connections, stop sending new requests, deploy the new version, verify health checks, and then return the node to rotation without exposing users to the intermediate state.
Finished this lesson?

Mark it complete to track your progress through the workbook.