DrawLintDrawLint.ai
🧩Core Building Blocks·6 min read

Message Queues & Streams

Decouple producers from consumers, absorb spikes, and process work asynchronously.

Message queues and streams let one part of a system hand work to another asynchronously. Instead of making a user wait while every downstream service finishes, the producer records a message and returns. Consumers process messages at their own pace, with retries, ordering rules, and backpressure controls.

🔭Think of it like…
A restaurant does not make waiters stand beside the stove. A waiter writes an order ticket and clips it to a rail. Cooks pull tickets when they have capacity, and a manager can see when the rail is backing up. A queue is that rail: it decouples fast order taking from slower cooking.

The problem: slow work and fragile coupling

Synchronous calls are simple until the downstream service is slow, down, or overwhelmed. If checkout directly calls email, analytics, warehouse, billing webhooks, and recommendation updates before returning, one weak dependency can ruin the user-facing request.

async handoff
HTTP request
  -> write order in database
  -> publish OrderCreated message
  -> return 202/200 to user

workers later:
  -> send email
  -> reserve warehouse stock
  -> update search index
  -> notify analytics
  • Decoupling: producers and consumers do not need to be online at the same time or scale at the same rate.
  • Spike absorption: bursts become queue depth instead of immediate downstream overload.
  • Background work: slow tasks leave the request path.
  • Failure containment: retries and dead-letter queues isolate poison messages from healthy traffic.
Reliability boundary
Publishing a message is often part of a larger workflow. The outbox and CDC patternsolves the classic bug where the database commit succeeds but the message publish fails.

Traditional queues vs streaming logs

People often say "queue" for every async system, but a work queue and a log have different semantics. A queue distributes tasks; a log records an ordered history that many consumers can replay independently.

ModelMessage lifecycleConsumer behaviorExamplesBest for
QueueMessage is removed or hidden after a consumer succeedsMany workers compete; one worker handles each taskSQS, RabbitMQ work queues, Redis listsBackground jobs, email sending, image processing
Streaming logMessage is appended and retained for time or sizeEach consumer group tracks its own offsetKafka, Pulsar, Kinesis, Redis StreamsEvent history, analytics pipelines, CDC, replayable integrations

Queue mental model

Use a queue when the message represents work to do once: send this email, charge this webhook, resize this image. Adding workers increases throughput because workers compete for tasks.

Log mental model

Use a log when the message is an event fact: order created, payment captured, user upgraded. Multiple teams can consume the same history at different speeds, and a new consumer can replay from the beginning.

Delivery semantics: at-most-once, at-least-once, exactly-once

Delivery semantics describe what can happen when producers, brokers, or consumers crash. They are not marketing words; they determine whether your consumer code must tolerate lost messages or duplicates.

SemanticMeaningTypical implementationRisk
At-most-onceMessage is delivered zero or one timeMark done before processingFailures can lose work
At-least-onceMessage is delivered one or more timesProcess, then ackDuplicates are normal
Exactly-onceEffect appears once despite retriesBroker transactions plus idempotent sinksLimited scope; end-to-end design still matters
at-least-once consumer
msg = queue.receive()
try:
    # use message_id as an idempotency key
    if not db.already_processed(msg.id):
        perform_side_effect(msg)
        db.mark_processed(msg.id)
    queue.ack(msg)
except Exception:
    # no ack: message becomes visible again or is redelivered
    raise
Idempotency is the practical answer
Most production queues are at-least-once. Design consumers so processing the same message twice has the same external effect as processing it once: use unique constraints, idempotency keys, natural event ids, and upserts.

Ack, visibility timeout, retries, and dead-letter queues

A broker needs to know whether work succeeded. In queue systems, the consumer receives a message, processes it, and sends an ack. If the ack never arrives, the broker assumes the worker died and makes the message available again.

  • Acknowledgement: deletes or commits the message only after successful processing.
  • Visibility timeout: hides a message from other workers for a period while one worker processes it.
  • Retry with backoff: transient failures should wait longer between attempts to avoid hammering a broken dependency.
  • Dead-letter queue: after too many failures, move the message aside for investigation instead of blocking the main queue.
visibility timeout failure
t=00 worker A receives message M; M hidden for 30s
t=10 worker A calls payment API; API is slow
t=30 visibility timeout expires before A acked
t=31 worker B receives M and also processes it
t=35 worker A finally succeeds

Without idempotency, the user may be charged twice.
Set visibility timeout from real work time
Too short creates duplicate processing while the first worker is still alive. Too long delays retries after a real crash. Many systems extend the timeout as work progresses or split long jobs into smaller messages.

Ordering, partitions, and consumer groups

Ordering is expensive because it limits parallelism. Systems usually offer order within a queue, partition, shard, or message group, not across the entire world. The design trick is choosing the key whose events must stay ordered.

ConceptHow it worksDesign implication
PartitionMessages with the same key go to the same ordered laneUse order_id, account_id, or conversation_id when per-entity order matters
Consumer groupMany consumers share work from partitionsParallelism is limited by partition count and hot keys
OffsetA consumer group records its position in a logReplaying means moving offsets back or starting a new group
FIFO groupQueue preserves order per message groupOne hot group can serialize too much work
Kafka-style partitioning
topic: order-events
key = order_id

order 101 -> partition 3: Created, Paid, Shipped
order 202 -> partition 7: Created, Cancelled

Consumers in the same group divide partitions, but events for one order stay ordered.
Order only what needs order
Global ordering is rarely worth the throughput cost. Preserve order per account, order, conversation, or aggregate root, and let unrelated entities process in parallel.

Backpressure and capacity planning

A queue can hide a downstream outage, but it cannot erase work. If producers enqueue 10,000 jobs per second and consumers process 2,000 per second, the backlog grows forever. Backpressure is the system pushing that reality back to producers or users before storage, latency, or costs explode.

  • Track queue depth, oldest message age, retry rate, DLQ count, consumer lag, and processing latency.
  • Autoscale consumers when backlog rises, but cap concurrency to protect downstream dependencies.
  • Shed low-priority work, pause producers, or return 429/503 when lag exceeds user promises.
  • Separate priority queues so bulk analytics does not starve password reset emails.
backlog math
arrival_rate = 10000 messages/sec
processing_rate = 8000 messages/sec
backlog_growth = 2000 messages/sec

After 10 minutes:
  2000 * 60 * 10 = 1,200,000 messages waiting

Queues buy time; they do not remove capacity limits.

SQS vs RabbitMQ vs Kafka

The right broker depends on whether you want managed task queues, flexible broker routing, or a durable replayable log. These tools overlap, but their operational centers of gravity are different.

SystemModelStrengthsTrade-offs
SQSManaged cloud queueVery low operations, visibility timeout, DLQ, elastic scaleLimited routing semantics; standard queues are best-effort ordering
RabbitMQBroker with exchanges and queuesFlexible routing, acknowledgements, priorities, RPC-like patternsYou operate broker clusters and capacity; replay is not the main model
KafkaPartitioned append-only logHigh-throughput streams, retention, replay, consumer groupsOperationally heavier; ordering is per partition; consumers manage offsets
Related pattern
For event-stream architectures, see Kafka. For reliable publishing from a database transaction, pair queues with outbox and CDC.
Key takeaways
  • Queues decouple producers from consumers, move slow work off the request path, and absorb spikes as backlog.
  • A queue distributes tasks to workers; a streaming log retains an ordered history that multiple consumer groups can replay.
  • At-least-once delivery is the common default, so consumers must be idempotent and safe under duplicates.
  • Ack, visibility timeout, retries, and DLQs are the core failure-handling mechanics for task queues.
  • Ordering is usually per partition or message group, and backpressure is required because queues buy time but not infinite capacity.
It is useful because a worker crash does not lose the message; the broker can redeliver it. It is dangerous because the first worker may have completed the side effect before crashing or missing the ack, so a second worker can repeat the work. Idempotency prevents duplicate external effects.
Choose a log when events are a durable history that many consumers need to read independently, replay, or process from different offsets. Use a task queue when each message represents work that one worker should perform and remove.
It prevents a poison message from being retried forever and blocking healthy work. After a configured number of failures, the message moves to a separate queue where operators can inspect, fix, replay, or discard it.
Finished this lesson?

Mark it complete to track your progress through the workbook.