Outbox Pattern + CDC
Reliably publish events after a DB write — never lose a message between commit and queue.
How do you reliably publish an event after a database write when the database and the message broker cannot share one atomic commit? The Outbox pattern writes the event into the same database transaction as the business change, then a CDC relay such as Debezium publishes it to Kafka.
The problem: dual writes lose events
The naive flow looks harmless: update Postgres, then publish an event to Kafka. But that is a dual write: one logical operation is split across two independent systems. The database can commit while the broker publish fails, or the broker can accept the event while the database transaction rolls back. There is no general, practical way for your app server to make both systems commit atomically.
createOrder(request):
tx = db.begin()
db.orders.insert(order)
tx.commit() # order is now durable
kafka.publish("order.created", order) # app may crash right here
# Result: the order exists, but billing/search/email never hear about it.- Lost event: the app dies after the DB commit but before the broker publish. Downstream services never learn about a real business fact.
- Phantom event: the app publishes first, then the DB transaction rolls back. Consumers react to something that never became true.
- Duplicate event: the publish succeeds, the app times out before seeing the acknowledgment, and retry publishes the same event again.
The fix: write an outbox row in the same transaction
The outbox moves the broker publish out of the user request path. When the application changes business state, it also inserts a row into anoutbox_events table in the same database transaction. That gives you the one guarantee you really need: the event is durable if and only if the business change is durable.
BEGIN;
INSERT INTO orders (id, user_id, status, total_cents)
VALUES (:order_id, :user_id, 'PLACED', :total_cents);
INSERT INTO outbox_events (
id, aggregate_type, aggregate_id, event_type, payload, created_at
) VALUES (
:event_id,
'order',
:order_id,
'OrderPlaced',
json_build_object('orderId', :order_id, 'totalCents', :total_cents),
now()
);
COMMIT; -- both rows become visible, or neither row doesNotice what did not happen in that transaction: the app did not call Kafka. The request can return once the database commit succeeds. Publishing is delegated to a separate relay that scans or tails the outbox.
| Approach | Crash after DB commit | Crash after publish | Operational feel |
|---|---|---|---|
| DB write then publish | Event can be lost | Usually safe, unless retry duplicates | Simple but unsafe |
| Publish then DB write | Consumer can see a false event | Event can be phantom | Unsafe for business facts |
| Outbox in same transaction | Event row is still durable | Relay may retry the event | Reliable with idempotent consumers |
Relay and CDC: turning rows into broker messages
A relay publishes outbox rows to a broker. The simplest relay pollsoutbox_events for unpublished rows, publishes them, and marks them sent. At scale, teams often prefer Change Data Capture: a tool such as Debezium tails the database write-ahead log (WAL), sees committed inserts into the outbox table, and streams them to Kafka.
Application transaction
-> INSERT orders row
-> INSERT outbox_events row
-> COMMIT
Postgres WAL
-> contains the committed outbox insert in commit order
Debezium connector
-> tails WAL position LSN 0/7F3A90
-> transforms outbox row into message
-> publishes to Kafka topic order.events
key = aggregate_id # keeps one order on one partition
value = payload
-> stores connector offset after Kafka acknowledgesPolling relay versus CDC relay
| Relay style | How it finds work | Strengths | Gotchas |
|---|---|---|---|
| Polling publisher | SELECT unsent rows with locks | Easy to build and debug | Adds DB polling load; careful locking needed |
| CDC with Debezium | Tails the WAL after commit | Low-latency, preserves commit order, avoids polling | Requires connector ops and schema discipline |
Delivery semantics: at-least-once, idempotency, and ordering
Outbox plus CDC gives durable publication, not magical exactly-once effects everywhere. If the relay publishes to Kafka and crashes before saving its offset, it can publish the same outbox row again after restart. That is normal at-least-once delivery: every event eventually arrives, but some events can arrive more than once.
- Make event IDs stable: generate
outbox_events.idinside the original transaction. Consumers record processed IDs or use natural idempotency keys before applying side effects. - Key by aggregate: publish Kafka messages with
aggregate_idas the key so all events for one order, account, or conversation land on one partition and remain ordered. - Keep transaction order: CDC reads committed WAL entries, so it can preserve database commit order within a table or partitioned stream. Cross-aggregate global order is rarely useful and often too expensive.
- Design consumers for replay: Kafka retention means a new consumer may reread old events. Replays should rebuild state, not resend duplicate emails or charge cards again.
Real-world examples and design choices
Outbox is most valuable when other systems must react to committed business facts: an order was placed, a payment settled, a user changed email, or a file finished scanning. These events often feed read models, search indexes, notifications, analytics, and stream processors.
- E-commerce: insert
OrderPlacedwith the order row; inventory reservation and email services consume the event. - Payments: insert
PaymentCapturedonly after the ledger transaction commits; downstream reporting can trust that the event describes durable state. - Collaboration apps: insert document-change events with the document version; consumers update search and activity feeds in version order.
Schema fields that pay for themselves
| Column | Why it matters | Example |
|---|---|---|
| id | Consumer dedupe and traceability | evt_01J... |
| aggregate_id | Kafka key and per-entity ordering | order_123 |
| event_type | Consumer routing and schema selection | OrderPlaced |
| payload | The event body | {orderId,totalCents} |
| created_at | Lag monitoring and replay windows | 2026-06-10T07:00Z |
Edge cases and gotchas
- Outbox table growth: CDC-based designs may keep rows for audit, while polling designs often mark or delete sent rows. Either way, partition or archive so the table does not become a hidden hot spot.
- Schema evolution: consumers outlive producers. Version payloads or use a schema registry so a new field does not break old consumers.
- Poison events: a malformed event can block one consumer. Use dead-letter topics and alarms rather than silently skipping it.
- Side effects in consumers: writing a read model is easy to make idempotent; sending email or charging money needs stronger dedupe around the external side effect.
- The outbox pattern fixes the dual-write problem by making the event row part of the same database transaction as the business change.
- A relay or CDC tool such as Debezium tails committed outbox rows and publishes them to a broker after the user transaction commits.
- Delivery is at-least-once, so every consumer must dedupe by stable event ID or apply naturally idempotent updates.
- Ordering is usually per aggregate: key Kafka messages by aggregate_id so one entity maps to one partition.
- Operational details matter: table cleanup, schema evolution, poison events, relay lag, and replay-safe consumers are part of the design.
Mark it complete to track your progress through the workbook.