Idempotency Keys
Make retries safe so a duplicated request never creates a second order or payment.
Networks fail in the most inconvenient place: after the server has done the work, but before the client receives the response. A user taps Pay, the payment succeeds, the response times out, and the app retries because retrying is usually the right reliability move. Without a guardrail, that retry can create a second order, send a second email, or charge the card twice. Idempotency keysmake a mutation safe to retry by turning "do this again" into "return the result for the same logical operation".
#42. If you come back twice with the same ticket, the attendant gives you the same coat record back; they do not invent a second coat. The ticket names one real-world action, and the system remembers what already happened for that ticket.The problem: safe retries can duplicate side effects
Retrying reads is usually harmless. Retrying writes is dangerous because writes have side effects: money moves, inventory changes, seats become reserved, and downstream services receive messages. The nasty failure mode is not simply "the first request failed". It is "the first request may have succeeded, but the client cannot tell".
1. Client → POST /payments {orderId: "o_123", amount: 50}
2. API charges card successfully at the payment gateway
3. API response is lost because the network times out
4. Client retries the same POST /payments
5. API charges card again because it treats the retry as a new requestThis is common in real systems because clients, load balancers, message queues, and job runners all retry. Mobile apps retry after spotty Wi-Fi. Browsers retry after a tab resumes. Queue consumers retry after a worker crashes. If the operation is expensive or irreversible, the server must be able to recognize that two HTTP requests are really one logical operation.
How it works: record first execution, replay retries
The client creates a high-entropy key, usually a UUID, before sending a mutation. The key travels in a header such as Idempotency-Key: 7b4c.... The server scopes the key to a caller and operation, atomically claims it, executes the work once, stores the response, and replays that response for later attempts.
Client chooses key K = uuid()
POST /orders
Idempotency-Key: K
Body: {cartId: "cart_9", paymentToken: "tok_abc"}
Server:
1. Look up (tenant_id, endpoint, K)
2. If completed:
return stored status_code + response_body
3. If in_progress:
wait, poll, or return 409/202 so the client retries later
4. If missing:
atomically insert {K, request_hash, status: "in_progress"}
create order and charge payment exactly once
update {K, status: "completed", response_body, status_code}
return the responseStore the result, not only the fact that the key existed
A good idempotency table stores the final HTTP status and response body. If the first attempt returns 201 Created with {orderId: "o_123"}, the retry should get that same response. That makes the client state machine simple: retry until it receives the result for the operation it already asked for.
| Stored field | Why it matters | Example |
|---|---|---|
| scope | Prevents unrelated callers from colliding | tenant + user + endpoint |
| key | Names one logical operation | UUID generated by the client |
| request_hash | Detects accidental key reuse with a different body | SHA-256 of canonical body |
| status | Distinguishes in-progress from completed attempts | in_progress / completed / failed |
| response | Lets retries receive the same result | HTTP 201 + order payload |
| expires_at | Bounds storage growth | 24h, 7d, or business-specific |
Storage: Redis, databases, and unique constraints
Idempotency is only as strong as the atomic claim on the key. Two requests with the same key can arrive at the same millisecond, hit two API servers, and race. The storage layer must make exactly one of them the first executor.
-- One row per logical request.
CREATE TABLE idempotency_keys (
scope text NOT NULL,
key text NOT NULL,
request_hash text NOT NULL,
status text NOT NULL,
response_json jsonb,
status_code int,
expires_at timestamptz NOT NULL,
PRIMARY KEY (scope, key)
);
-- Atomic claim. Only one concurrent request can insert this row.
INSERT INTO idempotency_keys(scope, key, request_hash, status, expires_at)
VALUES ($scope, $key, $hash, 'in_progress', now() + interval '24 hours')
ON CONFLICT DO NOTHING;- Relational database: use a unique constraint on
(scope, key). This is the most straightforward choice when the mutation also writes to the same database, because claiming the key and creating the order can live in one transaction. - Redis: use
SET key value NX PX ttlor a Lua script to atomically claim and update state. Redis is fast and useful for high-volume APIs, but think carefully about persistence and failover if the side effect is financial. - Payment gateway keys: many providers accept their own idempotency key. Still store your local key too, so your order service and your gateway call share one business operation.
Concurrent duplicates: lock or wait on the key
The hardest duplicate is not a retry minutes later; it is two identical attempts in flight at once. Maybe the user double-clicked, or an API gateway retried while the first request was still running. If both execute before either stores a result, idempotency has failed.
result = try_insert_key(scope, key, request_hash)
if result == "inserted":
try:
response = execute_mutation_once()
mark_completed(scope, key, response)
return response
except Exception as error:
mark_failed_or_delete_claim(scope, key, error)
raise
existing = load_key(scope, key)
if existing.request_hash != request_hash:
return 409 Conflict # same key, different operation
if existing.status == "completed":
return existing.stored_response
if existing.status == "in_progress":
return 202 Accepted # or wait briefly, then ask client to retrySome systems block the second request until the first completes; others return 202 Accepted or 409 Conflict with a retry-after hint. The right choice depends on latency. For a sub-second order creation, waiting is friendly. For a multi-minute asynchronous workflow, returning a status endpoint is cleaner.
| Strategy | How it handles in-flight duplicates | Trade-off |
|---|---|---|
| Wait on the key | Second request waits for the first result | Best UX, but ties up a connection |
| Return 202 | Client polls or retries after a delay | Scales well for long work |
| Return 409 in_progress | Client learns the key is busy | Simple, but clients need custom handling |
| Per-key lock | Only the lock holder executes | Correct, but lock TTLs must be chosen carefully |
Scope, TTL, and payload matching
Keys are not globally magical. They need a scope, an expiry, and a rule for mismatched payloads. Without those details, a key from one user could collide with another user, storage would grow forever, or a buggy client could accidentally reuse a key for a different operation.
- Scope: store keys under something like
tenant_id + user_id + endpoint + key. A key used forPOST /ordersshould not affectPOST /refunds. - TTL: keep the record longer than all expected retries, queue redeliveries, and client reconnect windows. Payments often use at least 24 hours; business-critical workflows may keep keys for days.
- Payload hash: canonicalize the request body and store a hash. If the same key arrives with a different hash, return
409 Conflictinstead of guessing which operation the client intended. - Failure semantics: decide whether to cache failures. Validation errors can be replayed safely. Transient
500errors may be better marked failed and retried through a recovery path.
Real-world examples and related patterns
Idempotency keys appear anywhere at-least-once delivery meets side effects. Stripe popularized the HTTP header for payments. Ecommerce systems use keys to protect order creation. Email providers use message IDs to avoid sending duplicates. Queue consumers use event IDs so a replayed Kafka or outbox event updates the database once.
| Use case | Key scope | Stored result |
|---|---|---|
| Create order | customer + cart + idempotency key | order id and status |
| Charge payment | merchant + payment key | gateway charge id |
| Reserve booking | user + itinerary + key | reservation id or sold-out response |
| Consume event | consumer name + event id | processed marker and side effects |
- Retries are necessary for reliability, but write retries can duplicate orders, payments, bookings, and messages unless the server recognizes the same logical operation.
- The client supplies a unique idempotency key; the server atomically records the key, request hash, status, and final response for the first execution.
- Retries with the same key and same payload replay the stored status and response instead of executing the side effect again.
- Use Redis atomic commands or a database unique constraint to claim the key; handle concurrent in-flight duplicates with waiting, 202/409 responses, or a per-key lock.
- Scope keys by caller and operation, keep them past the retry window with a TTL, and reject the same key with a different payload.
K for a $50 payment and later sends the same key for a $70 payment, the server can detect the different payload and return 409 Conflict instead of replaying the wrong result or executing a different operation.SET NX. Only one request becomes the executor. The other observes an in-progress or completed key and waits, retries later, or replays the stored result.Mark it complete to track your progress through the workbook.