Reviewed by 6 specialized AI reviewers. Explore the diagram and the full per-section feedback below.
Loading diagram…
There is real architectural substance here and the candidate demonstrates familiarity with distributed cache patterns, but the design falls short of senior-level rigor in the areas that matter most: correctness under failover, capacity reasoning, and precise system contracts. The gaps are not cosmetic; they affect whether the system can safely meet its stated SLA and scale assumptions.
Core quality attributes are explicitly identified
The section clearly calls out availability, latency, consistency, and scalability. That is the right set of non-functional dimensions for a cache-heavy system and shows the candidate is thinking beyond just functional behavior.
Consistency model is chosen with an explicit trade-off
Stating eventual consistency with a leader-replica cache and noting that the cache is not the source of truth is a sensible trade-off for a low-latency system. This shows awareness that some durability and freshness can be relaxed when the cache can be rebuilt from the real source of truth.
Latency target is scoped correctly
The p99 < 1ms target is explicitly defined from the cache perspective and excludes network time. That boundary makes the target more defensible than quoting an end-to-end number without clarifying what is being measured.
Availability target is not tied to a failure scenario
You state >99.99% client-side SLA, but what happens when the primary cache node fails or a replica lags during failover? Without explaining what level of degraded behavior is still acceptable to the client, the availability target is just a number and it is hard to tell whether stale reads, misses, or temporary fallback latency still satisfy the SLA.
Scalability number is floating without connection to assumptions
You mention scalability for 100k RPS, but have you considered how that maps to the expected traffic shape: read/write ratio, hot-key skew, peak vs steady load, and replication overhead? Without tying 100k RPS to those assumptions, it is unclear whether the consistency and latency targets are realistic under the actual workload.
Consistency choice is named but not justified against correctness impact
You chose eventual consistency, but what happens when a client reads immediately after a write and gets stale data from a replica? Since this is an explicit trade-off, the design should say what kinds of stale reads are acceptable and what user-visible behavior would break if the data is not fresh.
Durability trade-off could be made more concrete
You could improve this by stating exactly what 'some durability' means in practice. For example: is data loss on cache node restart acceptable, is asynchronous replication acceptable, and is the expected recovery path a cache refill from the source of truth? That would make the NFRs drive clearer design decisions.
Core nouns are only partially identified
You named Key, Value, and TTL, which are relevant, but this reads more like field names than a clear domain model. For a senior-level discussion, I would ask: is the stored record itself an entity, or are Key and Value being treated as separate entities? Without a clear primary object such as a key-value entry/item, the model is hard to reason about.
Happy-path entity model is underspecified
Have you considered what the actual stored object is along the core flow of write, read, and expiry? If the system stores '1 key: 1 value, TTL optional,' then the main entity is likely an entry that owns a key, value, and expiration metadata. Without explicitly modeling that, it is unclear how creation, lookup, overwrite, and expiration operate on the same domain object.
Relationships are not clearly defined
You mention '1 key: 1 value, TTL optional,' which hints at a relationship, but I would push on how these concepts connect. Is TTL attached to the key, the value, or the entry as a whole? Is there exactly one active value per key at any time, and does overwrite replace the prior entry? Making those relationships explicit would strengthen the design and avoid ambiguity in the core data model.
No logical chain from traffic to storage
You list server RAM, total data, node count, and per-node RPS, but there is no clear derivation tying them together. How did you get from the assumed users or request rate to 1TB of data, and from 1TB to 20 nodes if each node has 50 GB usable RAM rather than disk capacity? Without a DAU/QPS → data volume → node sizing chain, it is hard to trust whether the infrastructure actually matches the load.
Storage sizing methodology is unclear
Have you considered what resource is actually limiting each node? The calculation says '1TB of data = ~20 nodes' based on 50 GB usable per 64 GB RAM server, which implies memory-bound storage, but that only makes sense for an in-memory system. If this is disk-backed storage, node count should be driven by disk, IOPS, and replication overhead; if it is memory-resident, you should explain why the full dataset must fit in RAM. As written, the sizing basis is ambiguous.
Traffic estimate is not grounded
You mention 100k RPS and divide it across 20 primary nodes, but there is no back-of-envelope showing where 100k RPS comes from. Have you considered deriving it from user behavior, peak traffic, and read/write mix? Without that, 5k RPS per node is just arithmetic on an unsupported assumption rather than a capacity plan.
Replica impact on serving capacity is not discussed
What happens if replicas also serve reads, or if failover shifts traffic onto fewer nodes? The calculation doubles node count for replication, but it does not explain whether the 100k RPS is handled only by primaries or shared across replicas, nor what headroom exists during node loss or rebalancing. Without that, the architecture fit to the stated load is uncertain.
Add bandwidth and growth assumptions
You could improve this by extending the chain beyond node count: estimate request/response size, network throughput, storage growth rate, and retention horizon. That would show whether the chosen node count is constrained by CPU, memory, disk, or bandwidth, and make the capacity story much more complete.
Write semantics are ambiguous on POST /keys/:id
What happens when a client POSTs to /keys/:id for a key that already exists? Returning either 201 or 200 suggests create-or-overwrite behavior, but the API does not say whether this is idempotent, whether the old value is replaced unconditionally, or how concurrent writers are handled. Without clear semantics, clients cannot safely retry after timeouts because a retry may accidentally overwrite data. You could improve this by separating create vs update semantics or explicitly documenting overwrite behavior and retry/idempotency expectations.
GET response shape is too underspecified to be safely consumable
What does the client actually receive from GET /keys/:id when the key exists: a raw string, a JSON object, or metadata including expiry? Returning just 'Value' is not enough to build against, especially since expiry is part of the write contract. Clients may need to know whether the key is near expiration or whether the value was stored with a TTL. You could improve this by defining a concrete response schema and whether expired keys return 404 or some other state.
Error handling is incomplete for common failure cases
What does the client see when the request body is invalid, the expiry is malformed or in the past, the value is too large, or the service is temporarily unavailable? Only 200/201/404 are listed, so clients have no guidance on how to distinguish bad input from retryable server failures. Without clear status codes and error payloads, client retry behavior will be inconsistent and may amplify failures. You could improve this by defining 400-class validation errors, 5xx retryable errors, and a consistent error body shape.
DELETE semantics on missing or expired keys should be explicit
Have you considered what happens if a client deletes a key that has already expired or was already deleted? Returning 404 is one option, but many key-value APIs make delete idempotent and return success even when the key is absent so retries are safe. Being explicit here would make client behavior much clearer.
Protocol choice is implied but not explicitly defined
The routes look like REST over HTTP, which is reasonable for this use case, but the contract would be stronger if you explicitly stated content type, request/response format, and whether this is synchronous HTTP JSON. That removes ambiguity around whether GET returns plain text or JSON and helps clients integrate correctly.
Primary entity operations are minimal but not fully CRUD
You can create/read/delete a key, but update behavior is folded into POST without being clearly modeled. For a senior-level API review, I would push on whether the API should expose explicit update semantics or at least clearly define POST as upsert. That would make the resource model cleaner and easier to reason about.
End-to-end sharded request routing is thought through
The design does complete the core flow: clients fetch slot metadata, hash the key client-side, route directly to the owning primary, and handle redirection when topology changes. That is a solid choice for a distributed key-value cache because it avoids a central routing hop on the hot path.
Failure handling is considered at multiple levels
The candidate explicitly covers primary failure, replica catch-up, and full node loss with slot redistribution. That shows awareness that availability is not just about replication, but also about how clients and the cluster recover when ownership changes.
TTL cleanup is handled both lazily and proactively
Combining periodic cleanup with read-time expiry checks is a practical design trade-off for an in-memory cache. It keeps the write/read path simple while preventing obviously expired data from being returned if background cleanup lags.
Write acknowledgment semantics are unsafe for the stated availability goal
What happens when a primary accepts a write, returns success, and crashes before the replica has applied the replication stream entry? The system would lose an acknowledged write during failover. For a leader-replica design, you should be explicit about whether writes wait for replica ack, whether some data loss is acceptable, and how failover avoids promoting a stale replica.
Single-threaded nodes are the first likely bottleneck at 100k RPS
Have you considered what happens when one shard becomes hot or when a node must handle reads, writes, TTL checks, LRU updates, replication, and cleanup on a single thread? Even if average load per node looks fine, skewed keys or failover traffic can overload one primary quickly. You could improve this by discussing hot-key mitigation, more shards than physical nodes, or separating background work from the request thread.
Cluster membership and failover control plane is underspecified
What happens when gossip temporarily disagrees across nodes during a partition or during rapid failures? Different clients could route to different primaries for the same slot, causing split-brain behavior or inconsistent redirects. Gossip is useful for health dissemination, but you should explain who has authority to promote replicas and reassign slots so ownership changes are coordinated.
Replica and cleanup workers appear as per-node dependencies without redundancy details
Have you considered what happens if the cleanup worker or replication path for a node stalls? Expired keys would accumulate, memory pressure would rise, and replicas could drift behind without the design showing how lag is detected or how backpressure is handled. You could improve this by making these background functions part of the node process or by describing health checks and lag-based alerts.
Applications service is effectively outside the main data path
The core cache flow bypasses the Applications service, which is fine, but as drawn it is close to an orphaned component. You could improve the HLD by clarifying whether it is just a sample client, an SDK bootstrap endpoint, or an actual API layer, so the request path is unambiguous.
Rebalancing impact on live traffic is not addressed
What happens when a node fails and hash slots move while clients are still sending traffic using stale slot maps? The design mentions redirects, but not the migration behavior for in-flight writes or how much extra latency/error rate is expected during rebalance. You could strengthen this by describing temporary redirects, slot migration states, or dual-serving during ownership transfer.
Draw your architecture for Distributed Cache and get an instant hire/no-hire signal from 6 specialized AI reviewers — free to start.