DrawLintDrawLint.ai
🗺️Design Patterns·6 min read

Outbox Pattern + CDC

Reliably publish events after a DB write — never lose a message between commit and queue.

How do you reliably publish an event after a database write when the database and the message broker cannot share one atomic commit? The Outbox pattern writes the event into the same database transaction as the business change, then a CDC relay such as Debezium publishes it to Kafka.

🔭Think of it like…
Imagine a restaurant kitchen that must both cook an order and tell the delivery counter about it. If the chef shouts across the room after plating, a fire alarm at the wrong second can leave a finished meal with no delivery ticket. The outbox is a carbon-copy ticket written while the meal is created; even if everyone evacuates, the delivery counter can later read the ticket stack and continue.

The problem: dual writes lose events

The naive flow looks harmless: update Postgres, then publish an event to Kafka. But that is a dual write: one logical operation is split across two independent systems. The database can commit while the broker publish fails, or the broker can accept the event while the database transaction rolls back. There is no general, practical way for your app server to make both systems commit atomically.

the dangerous gap between two systems
createOrder(request):
  tx = db.begin()
  db.orders.insert(order)
  tx.commit()                 # order is now durable

  kafka.publish("order.created", order)  # app may crash right here

# Result: the order exists, but billing/search/email never hear about it.
  • Lost event: the app dies after the DB commit but before the broker publish. Downstream services never learn about a real business fact.
  • Phantom event: the app publishes first, then the DB transaction rolls back. Consumers react to something that never became true.
  • Duplicate event: the publish succeeds, the app times out before seeing the acknowledgment, and retry publishes the same event again.
Do not hide the gap with retries
Retrying the publish makes lost events less likely, but it does not make the two writes atomic. Retries also create duplicates, so downstream consumers still need idempotency.

The fix: write an outbox row in the same transaction

The outbox moves the broker publish out of the user request path. When the application changes business state, it also inserts a row into anoutbox_events table in the same database transaction. That gives you the one guarantee you really need: the event is durable if and only if the business change is durable.

business row plus outbox row commit together
BEGIN;

INSERT INTO orders (id, user_id, status, total_cents)
VALUES (:order_id, :user_id, 'PLACED', :total_cents);

INSERT INTO outbox_events (
  id, aggregate_type, aggregate_id, event_type, payload, created_at
) VALUES (
  :event_id,
  'order',
  :order_id,
  'OrderPlaced',
  json_build_object('orderId', :order_id, 'totalCents', :total_cents),
  now()
);

COMMIT;  -- both rows become visible, or neither row does

Notice what did not happen in that transaction: the app did not call Kafka. The request can return once the database commit succeeds. Publishing is delegated to a separate relay that scans or tails the outbox.

ApproachCrash after DB commitCrash after publishOperational feel
DB write then publishEvent can be lostUsually safe, unless retry duplicatesSimple but unsafe
Publish then DB writeConsumer can see a false eventEvent can be phantomUnsafe for business facts
Outbox in same transactionEvent row is still durableRelay may retry the eventReliable with idempotent consumers

Relay and CDC: turning rows into broker messages

A relay publishes outbox rows to a broker. The simplest relay pollsoutbox_events for unpublished rows, publishes them, and marks them sent. At scale, teams often prefer Change Data Capture: a tool such as Debezium tails the database write-ahead log (WAL), sees committed inserts into the outbox table, and streams them to Kafka.

CDC relay from WAL to Kafka
Application transaction
  -> INSERT orders row
  -> INSERT outbox_events row
  -> COMMIT

Postgres WAL
  -> contains the committed outbox insert in commit order

Debezium connector
  -> tails WAL position LSN 0/7F3A90
  -> transforms outbox row into message
  -> publishes to Kafka topic order.events
     key   = aggregate_id      # keeps one order on one partition
     value = payload
  -> stores connector offset after Kafka acknowledges

Polling relay versus CDC relay

Relay styleHow it finds workStrengthsGotchas
Polling publisherSELECT unsent rows with locksEasy to build and debugAdds DB polling load; careful locking needed
CDC with DebeziumTails the WAL after commitLow-latency, preserves commit order, avoids pollingRequires connector ops and schema discipline
The relay is allowed to be boring
The relay should not decide business logic. It translates a durable row into a broker message and advances its cursor only after the broker has acknowledged the publish.

Delivery semantics: at-least-once, idempotency, and ordering

Outbox plus CDC gives durable publication, not magical exactly-once effects everywhere. If the relay publishes to Kafka and crashes before saving its offset, it can publish the same outbox row again after restart. That is normal at-least-once delivery: every event eventually arrives, but some events can arrive more than once.

  • Make event IDs stable: generate outbox_events.idinside the original transaction. Consumers record processed IDs or use natural idempotency keys before applying side effects.
  • Key by aggregate: publish Kafka messages withaggregate_id as the key so all events for one order, account, or conversation land on one partition and remain ordered.
  • Keep transaction order: CDC reads committed WAL entries, so it can preserve database commit order within a table or partitioned stream. Cross-aggregate global order is rarely useful and often too expensive.
  • Design consumers for replay: Kafka retention means a new consumer may reread old events. Replays should rebuild state, not resend duplicate emails or charge cards again.
The real guarantee
The outbox guarantees that a committed business fact has a durable event waiting to be published. It does not remove duplicates; it makes duplicates manageable and lost events unacceptable.

Real-world examples and design choices

Outbox is most valuable when other systems must react to committed business facts: an order was placed, a payment settled, a user changed email, or a file finished scanning. These events often feed read models, search indexes, notifications, analytics, and stream processors.

  • E-commerce: insert OrderPlaced with the order row; inventory reservation and email services consume the event.
  • Payments: insert PaymentCaptured only after the ledger transaction commits; downstream reporting can trust that the event describes durable state.
  • Collaboration apps: insert document-change events with the document version; consumers update search and activity feeds in version order.

Schema fields that pay for themselves

ColumnWhy it mattersExample
idConsumer dedupe and traceabilityevt_01J...
aggregate_idKafka key and per-entity orderingorder_123
event_typeConsumer routing and schema selectionOrderPlaced
payloadThe event body{orderId,totalCents}
created_atLag monitoring and replay windows2026-06-10T07:00Z

Edge cases and gotchas

  • Outbox table growth: CDC-based designs may keep rows for audit, while polling designs often mark or delete sent rows. Either way, partition or archive so the table does not become a hidden hot spot.
  • Schema evolution: consumers outlive producers. Version payloads or use a schema registry so a new field does not break old consumers.
  • Poison events: a malformed event can block one consumer. Use dead-letter topics and alarms rather than silently skipping it.
  • Side effects in consumers: writing a read model is easy to make idempotent; sending email or charging money needs stronger dedupe around the external side effect.
Pair it with the right downstream primitive
Outbox commonly feeds Kafka, and Kafka consumers should use idempotency keysfor safe retries and replay.
Key takeaways
  • The outbox pattern fixes the dual-write problem by making the event row part of the same database transaction as the business change.
  • A relay or CDC tool such as Debezium tails committed outbox rows and publishes them to a broker after the user transaction commits.
  • Delivery is at-least-once, so every consumer must dedupe by stable event ID or apply naturally idempotent updates.
  • Ordering is usually per aggregate: key Kafka messages by aggregate_id so one entity maps to one partition.
  • Operational details matter: table cleanup, schema evolution, poison events, relay lag, and replay-safe consumers are part of the design.
Because the database commits the business row and the outbox row together. If the transaction rolls back, neither exists. If it commits, the event is durably stored even if the app crashes before publishing, so a relay can publish it later.
The relay can publish a message and crash before recording its new offset. On restart it may publish the same outbox row again. Reliable publication therefore means at-least-once delivery, and consumers must treat repeated event IDs as safe no-ops.
Use the aggregate ID as the Kafka key. Kafka sends the same key to the same partition, and Kafka preserves order within a partition, so one order or chat is consumed in sequence while other aggregates run in parallel.
Finished this lesson?

Mark it complete to track your progress through the workbook.