Database Replication
Copy data across machines for availability and read scale — and the lag it introduces.
Database replication means keeping copies of the same data on multiple machines. You replicate to survive machine failure, place data closer to readers, and spread read traffic across many servers. The hard part is not copying bytes once; it is keeping copies close enough to correct while the system is accepting new writes, losing networks, and promoting new leaders.
The problem replication solves: read scale and high availability
A single database primary is a fragile bottleneck. If it dies, the product goes dark. If all reads hit it, a dashboard or search page can steal CPU from checkout writes. Replicas create redundancy and move read work away from the write leader.
writes
app ─────────────────────▶ primary
│
│ replication stream
▼
┌────────┴────────┐
▼ ▼
replica A replica B
reads reads- Read scale: route expensive or high-volume reads to replicas so the primary can focus on writes and transactions.
- High availability: if the primary fails, promote a healthy replica and keep serving traffic.
- Geo-locality: place replicas near users or analytics jobs to reduce latency and isolate workloads.
Replication topologies: single-leader, multi-leader, and leaderless
The topology answers one question: where are writes allowed to go? That choice controls write latency, conflict handling, and failover complexity.
| Topology | Write path | Best for | Main danger |
|---|---|---|---|
| Single-leader | All writes go to one primary; replicas copy it | Most OLTP apps, Postgres/MySQL read replicas, simple correctness | Leader is the write bottleneck and failover target |
| Multi-leader | Multiple leaders accept writes and replicate to each other | Multi-region apps where local writes matter | Conflicting writes to the same row must be detected and resolved |
| Leaderless | Clients write/read from several replicas using quorums | Dynamo-style stores, high availability under partitions | Stale reads, read repair, and conflict resolution are part of the model |
Single-leader is the common default
PostgreSQL, MySQL, and many managed databases usually run with one writer and several read replicas. It keeps conflict handling simple because writes have one serialization point. The price is that write throughput and write availability depend on that leader.
Multi-leader moves writes closer to users
If Europe and India both need low-latency writes, each region can accept local writes and replicate to the other. But if the same account is edited in both places during a network split, the database or application must merge or reject one version.
Leaderless uses quorums
Dynamo-style systems write to several replicas and consider the write successful after enough acknowledgements. Reads also ask multiple replicas and reconcile versions. This improves availability, but clients must be designed for stale or conflicting versions.
Synchronous vs asynchronous replication
Replication can either be on the commit path or happen after commit. This is one of the most important latency-versus-safety trade-offs in database design.
| Mode | When commit returns | Benefit | Failure mode |
|---|---|---|---|
| Asynchronous | After the leader commits locally | Low write latency and high throughput | If the leader dies before shipping the change, acknowledged writes can be lost |
| Synchronous | After at least one replica confirms the write | A promoted replica should contain committed writes | Writes slow down or block when replicas are slow or unreachable |
| Semi-sync / quorum | After a configured subset confirms | Balances safety and latency | Operationally more complex; still must define exact guarantees |
async:
client -> primary: INSERT order
primary fsyncs locally
primary -> client: committed
primary -> replica: ships change later
sync:
client -> primary: INSERT order
primary fsyncs locally
primary -> replica: ship change
replica -> primary: acknowledged
primary -> client: committedReplication lag and read-your-writes anomalies
Replication lag is the delay between a write committing on the leader and appearing on a replica. Under normal load it might be a few milliseconds. During a backup, network issue, long transaction, or replica restart, it can become seconds or minutes.
1. PATCH /profile -> write goes to primary
2. HTTP 200 OK -> user expects new name to be visible
3. GET /profile -> load balancer sends read to a lagging replica
4. response contains old name- Read-your-writes violation: a user does not see their own update immediately after saving.
- Monotonic read violation: a user sees version 10, refreshes, then hits a different replica and sees version 8.
- Stale authorization: a permission revocation is written to the leader but a replica still says the user has access.
Failover and leader promotion
Failover is the process of detecting a dead leader, choosing the best replica, promoting it, and redirecting clients. It sounds simple, but it is where replication design meets real distributed systems pain.
1. Detect primary failure with timeouts and health checks.
2. Fence the old primary so it cannot accept split-brain writes.
3. Pick the most up-to-date replica by log position / LSN.
4. Promote that replica to leader.
5. Reconfigure other replicas to follow the new leader.
6. Update DNS, proxy, or connection strings.
7. Audit possible data loss if replication was asynchronous.- Split brain: two nodes both believe they are leader and accept divergent writes. Fencing and consensus systems prevent this.
- Promotion lag: the most available replica may not be the most up-to-date replica. Choose based on recovery objectives.
- Client retry storms: reconnecting every application at once can overload the newly promoted leader.
Replication mechanisms: statement, WAL/physical, and logical
Databases copy changes in different forms. The form matters because it determines compatibility, determinism, and what downstream systems can consume.
| Mechanism | What is copied | Pros | Cons |
|---|---|---|---|
| Statement-based | SQL statements such as UPDATE users SET visits = visits + 1 | Compact and easy to understand | Non-deterministic functions, triggers, and ordering can diverge |
| WAL / physical | Low-level page or write-ahead-log changes | Accurate byte-for-byte replicas; common for HA | Usually same database version and storage format required |
| Logical | Row-level changes such as table, primary key, before/after values | Useful for CDC, selective replication, and upgrades | More decoding overhead; schema changes must be handled carefully |
Physical replication is often the right choice for standby failover. Logical replication is often the right choice when other services need a stream of business changes, such as updating a search index or publishing events.
- Replication copies data for read scale, high availability, and locality, but every copy introduces consistency questions.
- Single-leader replication is simple and common; multi-leader and leaderless systems improve locality or availability at the cost of conflict handling.
- Asynchronous replication is fast but can lose recent acknowledged writes on failover; synchronous replication is safer but adds latency and can block writes.
- Replication lag causes read-your-writes, monotonic-read, and stale-authorization bugs unless reads are routed with consistency in mind.
- Failover requires leader detection, fencing, promotion, client redirection, and clear RTO/RPO expectations.
Mark it complete to track your progress through the workbook.