Message Queues & Streams
Decouple producers from consumers, absorb spikes, and process work asynchronously.
Message queues and streams let one part of a system hand work to another asynchronously. Instead of making a user wait while every downstream service finishes, the producer records a message and returns. Consumers process messages at their own pace, with retries, ordering rules, and backpressure controls.
The problem: slow work and fragile coupling
Synchronous calls are simple until the downstream service is slow, down, or overwhelmed. If checkout directly calls email, analytics, warehouse, billing webhooks, and recommendation updates before returning, one weak dependency can ruin the user-facing request.
HTTP request
-> write order in database
-> publish OrderCreated message
-> return 202/200 to user
workers later:
-> send email
-> reserve warehouse stock
-> update search index
-> notify analytics- Decoupling: producers and consumers do not need to be online at the same time or scale at the same rate.
- Spike absorption: bursts become queue depth instead of immediate downstream overload.
- Background work: slow tasks leave the request path.
- Failure containment: retries and dead-letter queues isolate poison messages from healthy traffic.
Traditional queues vs streaming logs
People often say "queue" for every async system, but a work queue and a log have different semantics. A queue distributes tasks; a log records an ordered history that many consumers can replay independently.
| Model | Message lifecycle | Consumer behavior | Examples | Best for |
|---|---|---|---|---|
| Queue | Message is removed or hidden after a consumer succeeds | Many workers compete; one worker handles each task | SQS, RabbitMQ work queues, Redis lists | Background jobs, email sending, image processing |
| Streaming log | Message is appended and retained for time or size | Each consumer group tracks its own offset | Kafka, Pulsar, Kinesis, Redis Streams | Event history, analytics pipelines, CDC, replayable integrations |
Queue mental model
Use a queue when the message represents work to do once: send this email, charge this webhook, resize this image. Adding workers increases throughput because workers compete for tasks.
Log mental model
Use a log when the message is an event fact: order created, payment captured, user upgraded. Multiple teams can consume the same history at different speeds, and a new consumer can replay from the beginning.
Delivery semantics: at-most-once, at-least-once, exactly-once
Delivery semantics describe what can happen when producers, brokers, or consumers crash. They are not marketing words; they determine whether your consumer code must tolerate lost messages or duplicates.
| Semantic | Meaning | Typical implementation | Risk |
|---|---|---|---|
| At-most-once | Message is delivered zero or one time | Mark done before processing | Failures can lose work |
| At-least-once | Message is delivered one or more times | Process, then ack | Duplicates are normal |
| Exactly-once | Effect appears once despite retries | Broker transactions plus idempotent sinks | Limited scope; end-to-end design still matters |
msg = queue.receive()
try:
# use message_id as an idempotency key
if not db.already_processed(msg.id):
perform_side_effect(msg)
db.mark_processed(msg.id)
queue.ack(msg)
except Exception:
# no ack: message becomes visible again or is redelivered
raiseAck, visibility timeout, retries, and dead-letter queues
A broker needs to know whether work succeeded. In queue systems, the consumer receives a message, processes it, and sends an ack. If the ack never arrives, the broker assumes the worker died and makes the message available again.
- Acknowledgement: deletes or commits the message only after successful processing.
- Visibility timeout: hides a message from other workers for a period while one worker processes it.
- Retry with backoff: transient failures should wait longer between attempts to avoid hammering a broken dependency.
- Dead-letter queue: after too many failures, move the message aside for investigation instead of blocking the main queue.
t=00 worker A receives message M; M hidden for 30s
t=10 worker A calls payment API; API is slow
t=30 visibility timeout expires before A acked
t=31 worker B receives M and also processes it
t=35 worker A finally succeeds
Without idempotency, the user may be charged twice.Ordering, partitions, and consumer groups
Ordering is expensive because it limits parallelism. Systems usually offer order within a queue, partition, shard, or message group, not across the entire world. The design trick is choosing the key whose events must stay ordered.
| Concept | How it works | Design implication |
|---|---|---|
| Partition | Messages with the same key go to the same ordered lane | Use order_id, account_id, or conversation_id when per-entity order matters |
| Consumer group | Many consumers share work from partitions | Parallelism is limited by partition count and hot keys |
| Offset | A consumer group records its position in a log | Replaying means moving offsets back or starting a new group |
| FIFO group | Queue preserves order per message group | One hot group can serialize too much work |
topic: order-events
key = order_id
order 101 -> partition 3: Created, Paid, Shipped
order 202 -> partition 7: Created, Cancelled
Consumers in the same group divide partitions, but events for one order stay ordered.Backpressure and capacity planning
A queue can hide a downstream outage, but it cannot erase work. If producers enqueue 10,000 jobs per second and consumers process 2,000 per second, the backlog grows forever. Backpressure is the system pushing that reality back to producers or users before storage, latency, or costs explode.
- Track queue depth, oldest message age, retry rate, DLQ count, consumer lag, and processing latency.
- Autoscale consumers when backlog rises, but cap concurrency to protect downstream dependencies.
- Shed low-priority work, pause producers, or return 429/503 when lag exceeds user promises.
- Separate priority queues so bulk analytics does not starve password reset emails.
arrival_rate = 10000 messages/sec
processing_rate = 8000 messages/sec
backlog_growth = 2000 messages/sec
After 10 minutes:
2000 * 60 * 10 = 1,200,000 messages waiting
Queues buy time; they do not remove capacity limits.SQS vs RabbitMQ vs Kafka
The right broker depends on whether you want managed task queues, flexible broker routing, or a durable replayable log. These tools overlap, but their operational centers of gravity are different.
| System | Model | Strengths | Trade-offs |
|---|---|---|---|
| SQS | Managed cloud queue | Very low operations, visibility timeout, DLQ, elastic scale | Limited routing semantics; standard queues are best-effort ordering |
| RabbitMQ | Broker with exchanges and queues | Flexible routing, acknowledgements, priorities, RPC-like patterns | You operate broker clusters and capacity; replay is not the main model |
| Kafka | Partitioned append-only log | High-throughput streams, retention, replay, consumer groups | Operationally heavier; ordering is per partition; consumers manage offsets |
- Queues decouple producers from consumers, move slow work off the request path, and absorb spikes as backlog.
- A queue distributes tasks to workers; a streaming log retains an ordered history that multiple consumer groups can replay.
- At-least-once delivery is the common default, so consumers must be idempotent and safe under duplicates.
- Ack, visibility timeout, retries, and DLQs are the core failure-handling mechanics for task queues.
- Ordering is usually per partition or message group, and backpressure is required because queues buy time but not infinite capacity.
Mark it complete to track your progress through the workbook.