Kafka: Partitioned Log
A durable, replayable, ordered-per-key log for high-throughput event streaming.
Kafka is a distributed, durable, replayable, partitioned log. Producers append records to topics; consumers read those records at their own pace by offset. It is the backbone for event streaming because it combines high throughput, per-key ordering, retention, and replay in one primitive.
The primitive: an append-only partitioned log
A Kafka topic is split into partitions. Each partition is an append-only sequence of records. Kafka assigns every record an offset, which is just its position in that partition. Consumers do not ask Kafka to delete messages after reading; they remember the latest offset they processed.
topic: payments.events
partition 0: offset 0 offset 1 offset 2 offset 3
pmt_7 ok pmt_9 ok pmt_7 refund ...
partition 1: offset 0 offset 1 offset 2
pmt_8 ok pmt_12 ok pmt_8 dispute ...
partition 2: offset 0 offset 1
pmt_10 ok pmt_11 ok ...
consumer checkpoint for group analytics:
partition 0 -> next offset 3
partition 1 -> next offset 2
partition 2 -> next offset 1Ordering: guaranteed only within one partition
Kafka gives a strong but narrow ordering guarantee: records in one partition are read in the same order they were appended. There is no global order across partitions. To get meaningful ordering, choose a message key that represents the entity whose order matters, such asconversation_id, order_id, oraccount_id.
partition = hash(record.key) % topic.partition_count
key = conversation_42
-> all messages for conversation_42 land on partition 5
-> consumers see them in send order
key = random_uuid
-> great distribution
-> no per-conversation ordering guarantee- Good key: the aggregate whose sequence matters. Chat messages key by conversation, ledger entries key by account, and order lifecycle events key by order.
- Bad key: a value that changes for each event if you need ordering, or one hot value that sends all traffic to one partition.
- Partition count is part of the contract: adding partitions can change hash placement for new records, so designs that require strict long-term key affinity need a partitioning plan.
Consumer groups, parallelism, and rebalancing
A consumer groupis Kafka's way to parallelize one logical subscription. Within a group, each partition is assigned to at most one consumer at a time. If a topic has 12 partitions and a group has 4 healthy consumers, each consumer may process about 3 partitions. If one consumer dies, Kafka rebalances and assigns its partitions to the survivors.
| Concept | What it means | Design implication |
|---|---|---|
| One group | One logical application subscription | Scale it by adding consumers up to the partition count |
| Partition assignment | Only one consumer in the group owns a partition | Preserves per-partition order |
| Offset commit | The group records how far it processed | Restart resumes from the committed offset |
| Rebalance | Partitions move after membership changes | Consumers must handle pause, revoke, and retry cleanly |
Durability: replication, leaders, and ISR
Each partition has a leader broker and follower replicas. Producers send writes to the leader; followers copy the log. The in-sync replicaset (ISR) contains replicas that are caught up enough to be considered safe. For important data, production clusters commonly use replication factor 3, acks=all, and min.insync.replicas=2.
partition payments.events-0
broker A: leader offset 1042 offset 1043 offset 1044
broker B: follower offset 1042 offset 1043 offset 1044 <- in ISR
broker C: follower offset 1042 offset 1043 <- lagging
producer config:
acks = all
min.insync.replicas = 2
write offset 1045 succeeds only when the leader and enough ISR replicas store it.- Leader failure: Kafka elects an in-sync follower as the new leader so committed records survive broker loss.
- Producer acknowledgments:
acks=1is faster but can lose acknowledged writes if the leader dies before followers copy them.acks=allwaits for the configured ISR quorum. - Replication is not backup: it handles machine failure, not accidental deletion, bad producers, or infinite retention needs.
Retention and replay: the superpower
Kafka keeps records for a retention policy: maybe 7 days, 30 days, or a size cap. During that window, any consumer group can start at an older offset and replay history. This is how teams rebuild search indexes, backfill a new analytics pipeline, or recover from a bad deployment.
| Queue | Kafka log |
|---|---|
| Message is usually removed after one successful receive | Record stays until retention expires |
| Competing consumers share work | Many consumer groups independently read the same records |
| Replay often needs a dead-letter archive or source DB | Replay is built in by resetting offsets |
| Best for discrete tasks | Best for event history and stream processing |
This difference is why Kafka pairs naturally with message queues rather than replacing every queue. Use queues for commands that should be done once; use Kafka for facts that many systems may need to observe or replay.
Why Kafka is fast: sequential I/O and zero-copy
Kafka is optimized around the fact that appending to a file and reading it sequentially is extremely efficient. Brokers write partition logs to disk in append order, batch records together, and serve consumers by streaming contiguous bytes. On many platforms, Kafka can usezero-copy transfer so bytes move from the filesystem cache to the network socket without being copied repeatedly through user-space buffers.
- Batching: producers group many records into one request, which amortizes network and disk overhead.
- Sequential access: append and scan are friendly to disks, SSDs, page cache, and prefetching.
- Consumer pull: consumers fetch at their own pace, which naturally supports backpressure and independent replay.
Edge cases and gotchas
- At-least-once duplicates: a consumer can process a record and crash before committing its offset. On restart it rereads the record, so side effects need idempotency.
- Hot partitions: one celebrity user or giant tenant can dominate a partition if the key is too coarse. Shard hot keys only if you can tolerate weaker ordering or add a sequencer.
- Poison records: one bad record can block a partition. Use retries with limits, dead-letter topics, and alerting.
- Lag is a symptom, not a metric to ignore: rising consumer lag means producers are outpacing processing, consumers are stuck, or rebalances are too frequent.
- Kafka is an append-only, partitioned log: records have offsets and remain available until retention removes them.
- Ordering is guaranteed only within one partition, so choose keys around the entity whose order matters.
- Consumer groups scale one subscription by assigning partitions to consumers; rebalancing moves ownership after failures or deploys.
- Durability comes from replication, ISR, and producer settings such as acks=all with min.insync.replicas.
- Kafka is fast because it batches, writes sequentially, uses the page cache, and can stream bytes with zero-copy; replay is the major product feature.
conversation_id as the Kafka key. All records for one conversation land on the same partition and keep order, while different conversations spread across partitions for parallel processing.Mark it complete to track your progress through the workbook.