DrawLintDrawLint.ai
🧱Fundamentals·5 min read

The CAP Theorem (and PACELC)

Why a distributed system can't have it all when the network breaks — explained plainly.

The CAP theorem explains a hard choice in replicated distributed systems. When the network splits and replicas cannot talk to each other, the system must choose between Consistency and Availability. It cannot guarantee both for the same data at the same time during that partition.

🔭Think of it like…
Imagine two bank branches that normally share a live ledger. The phone line between them is cut. If both branches keep accepting withdrawals, they may overdraw the same account. If one branch refuses service until the line returns, balances stay correct but customers are turned away. That is CAP: during a communication break, serve everyone or preserve one correct truth.

The problem: replicas can lose contact

Replication is how systems scale reads, survive machine failures, and place data near users. But replicas communicate over networks, and networks sometimes drop, delay, duplicate, or reorder messages. A network partition is a failure where some nodes can still run but cannot communicate with other nodes.

partition creates two realities
Before partition:
  client writes x = 2 to Node A
  Node A replicates x = 2 to Node B
  reads from A or B return 2

During partition:
  Node A receives write x = 3
  A cannot send the update to B

        write x=3
client ─────────▶ Node A   ✂ network partition ✂   Node B ─────────▶ client
                  x = 3                              x = 2

Now a read on B cannot both be available and guaranteed latest.

The failure mode of ignoring CAP is promising impossible behavior: "all replicas are always up, every request always succeeds, and every read always sees the latest write." In a partitioned system, that sentence contradicts itself.

The theorem in practical words
If a partition prevents replicas from coordinating, a read or write on the isolated side must either wait/fail to preserve consistency, or succeed using local state and risk being stale or conflicting.

The three properties

CAP uses precise meanings. In system design interviews, define the words before applying them.

PropertyMeaning in CAPWhat users observe
Consistency (C)Every read sees the latest successful write, as if there is one copy of the dataNo stale reads; all replicas agree before answering
Availability (A)Every request to a non-failing node receives a non-error responseThe service keeps answering even if some nodes cannot coordinate
Partition tolerance (P)The system continues operating despite lost or delayed messages between nodesThe design acknowledges that network splits happen

Partition tolerance is not a feature toggle

In a real distributed system, you do not get to choose whether the network can fail. It can. That is why the common slogan "pick two of three" is misleading. The meaningful choice is what you do when P happens: favor C or favor A.

CP vs AP during a partition

During a partition, a replicated data system usually behaves like a CP system or an AP system for a given operation.

ModePartition behaviorGood fitExamples
CPPreserve consistency by rejecting, blocking, or redirecting requests that cannot be safely coordinatedLocks, metadata, bank ledgers, inventory reservations, leader electionZooKeeper, etcd, HBase, many strongly consistent SQL configurations
APPreserve availability by accepting local reads/writes and reconciling conflicts laterFeeds, likes, carts, presence, metrics, DNS-style dataCassandra, Amazon Dynamo-style systems, Riak, many DynamoDB single-region eventually consistent reads
CP behavior
partition occurs

write arrives at minority replica
replica cannot reach quorum/leader
replica returns error or timeout

Result:
  availability is reduced
  accepted writes remain consistent
AP behavior
partition occurs

write arrives at isolated replica
replica accepts write locally
later, partition heals
system reconciles versions using timestamps, vector clocks, CRDTs, or application logic

Result:
  availability is preserved
  readers may see stale or conflicting values temporarily

Real systems can mix choices. A shopping app might make payment authorization CP, product recommendations AP, and inventory reservation somewhere in between. CAP is not a label for the entire company; it is a way to reason about a data operation under partition.

The common misconception: do not say pick two always

The beginner version of CAP says you can pick any two of consistency, availability, and partition tolerance. That wording is memorable but wrong enough to cause bad designs.

  • You cannot simply "pick CA" for a distributed system and ignore partitions. If nodes communicate over a network, partitions are part of reality.
  • CAP does not say a CP system is always unavailable. It says availability may be sacrificed for affected operations during a partition.
  • CAP does not say an AP system is always inconsistent. It may be perfectly consistent during normal operation and only allow divergence when coordination is impossible or too expensive.
  • CAP is about a specific consistency guarantee. There are many weaker models such as causal, read-your-writes, monotonic reads, and eventual consistency. Those are covered in consistency models.
Beware marketing labels
Vendors may describe a database as CP, AP, globally consistent, highly available, or serverless. Always ask: for which operation, in which region setup, at what isolation level, and during what failure?

PACELC: Else, latency vs consistency

CAP focuses on partitions, but partitions are rare compared with normal operation. PACELC extends the reasoning: if there is a Partition, choose Availability or Consistency; Else, when the network is healthy, choose Latency or Consistency.

PACELC in one line
P A / E L  = if Partition, choose Availability; Else choose Latency
P C / E C  = if Partition, choose Consistency;  Else choose Consistency

The first half is the failure choice.
The second half is the everyday performance choice.
PACELC styleNormal operationPartition behaviorTypical shape
PA/ELFavor low latency reads/writesKeep serving if replicas splitDynamo-style, Cassandra-style workloads
PC/ECCoordinate for stronger consistencyBlock unsafe operationsZooKeeper/etcd-style coordination, strongly consistent databases
PC/ELLow latency when healthy, consistency during partitionMay use local fast paths but require quorum under failureSome tunable quorum systems by configuration

The PACELC lens is often more useful than CAP in day-to-day design. A globally replicated database can keep replicas consistent across continents, but every write may wait for cross-region coordination. That may be correct for a financial ledger and unacceptable for a social reaction counter.

Edge cases and gotchas

  • Partial partitions: not every node is split from every other node. Some links fail while others work, creating asymmetric behavior that is hard to test.
  • Timeouts are ambiguous: a timeout does not prove the other node failed. It may be slow, overloaded, or partitioned.
  • Conflict resolution is product logic: last-write-wins may be fine for a display name but dangerous for a bank balance.
  • Client retries can amplify conflicts: an AP write that times out may have succeeded locally. Retrying without idempotency can create duplicates.
  • CAP is not capacity planning: it describes a correctness trade-off under network failure, not CPU, memory, or disk throughput.
Key takeaways
  • CAP says that during a network partition, a replicated system must choose consistency or availability for affected operations.
  • Partition tolerance is not optional in real distributed systems; the practical choice is CP vs AP when coordination breaks.
  • CP systems such as ZooKeeper, etcd, and HBase reject or block unsafe operations to preserve correctness.
  • AP systems such as Cassandra and Dynamo-style stores keep serving and reconcile stale or conflicting data later.
  • PACELC adds the normal-case trade-off: even without partitions, stronger consistency often costs latency.
Because partitions are not optional once data is distributed across a network. The practical question is not whether to pick partition tolerance; it is whether to favor consistency or availability when a partition actually prevents coordination.
It refuses, blocks, or redirects operations that cannot be proven safe. The system sacrifices availability for those operations so it does not serve stale data or accept conflicting writes.
PACELC asks what the system does when there is no partition: does it favor low latency by avoiding coordination, or stronger consistency by waiting for replicas to agree?
Finished this lesson?

Mark it complete to track your progress through the workbook.