Reviewed by 6 specialized AI reviewers. Explore the diagram and the full per-section feedback below.
Loading diagram…
The candidate demonstrates good architectural instincts and chose sensible core building blocks for the problem, but the design lacks enough depth on failure modes, consistency boundaries, and capacity reasoning to be a confident senior-level hire. This is above average and workable, but not yet fully convincing at the expected level of scalability and operational completeness.
Core quality attributes are explicitly identified
The section clearly calls out availability, latency, scalability, and the fail-open/fail-close behavior. For a rate limiter, these are the right non-functional dimensions to surface because they directly affect whether the limiter protects downstream systems without becoming a bottleneck itself.
Availability versus consistency trade-off is stated
Saying availability matters more than consistency is a sensible starting point for a distributed rate limiter. It shows awareness that occasional limit inaccuracies may be preferable to turning the limiter into a single point of failure.
Targets are tied to the stated scale assumptions
The NFRs reference the given workload of 50K RPS and 100k unique clients rather than using abstract numbers in isolation. That is the right way to frame non-functional goals in an interview.
Consistency model is not defined beyond a preference for availability
Have you considered what happens when rate-limit state is replicated or partitioned across nodes? Saying 'availability >> consistency' is not enough by itself: is the limiter allowed to over-admit briefly, under-admit briefly, or must decisions for a single user be linearizable? Without stating the acceptable inconsistency window, the runtime behavior under node failure or cross-node races is ambiguous.
Latency target is not broken down by decision path
Have you considered what happens when the request needs both a limit check and a runtime config lookup for an admin-updated algorithm? A p99 < 100ms target is reasonable, but it is too coarse unless you define whether this is end-to-end added latency from the limiter, the internal decision time only, and whether the target still holds during config propagation or backend degradation.
Availability target is not connected to failure scenarios
What happens when the backing store for counters or configuration is unavailable? You mention 99.99% availability and configurable fail-open/fail-close, but the NFRs do not spell out which mode applies in which scenario or what error budget is acceptable for each. For example, fail-open protects availability but can violate enforcement; fail-close preserves enforcement but can reject healthy traffic.
Scalability target would be stronger with per-key skew assumptions
You could improve this by stating whether the 50K RPS is evenly spread across 100k clients or whether hot keys are expected. For a rate limiter, skew matters more than just total RPS because a few abusive clients can create concentrated write contention and change the consistency and latency requirements.
Runtime configurability needs an NFR around propagation delay
You could improve this by defining how quickly admin changes must take effect system-wide. Since changing the algorithm and logic at runtime is a functional requirement, the non-functional side should say whether updates must be visible immediately, within seconds, or eventually, because that choice directly drives the acceptable consistency model for configuration.
Core nouns for rate limiting are identified
The design names the main domain concepts the system revolves around: Client, Requests, Rules, and Limits. For this problem, that covers the basic rate-limiting flow of identifying a caller, matching applicable policy, and tracking usage against a limit.
Client abstraction supports multiple identity types
Modeling Client with a type such as API key, IP, userId, or session token is a solid choice because it keeps the domain flexible as different rate-limit dimensions are introduced without redefining the core entity.
Relationships between rules, clients, and counters are underspecified
Have you considered how a request resolves from Client to the applicable Rule and then to the specific Limit being consumed? Right now it is unclear whether Limits are per client, per client+API, or per client+rule. Without that relationship, the system can apply the wrong quota or merge unrelated traffic into the same bucket.
Requests is too vague as a core entity for the happy path
What happens when the same client calls multiple APIs with different policies? A generic Requests entity does not show the domain key used for enforcement, such as an API/resource identifier tied to the request. Without explicitly connecting request context to Rules, per-API rate limiting from the requirements is not fully represented.
Separate policy definition from runtime usage state
You could improve this by making the distinction between Rule as admin-configured policy and Limit or Counter as runtime state explicit. That makes the runtime update path clearer when admins change algorithms or thresholds at run-time, and avoids conflating configuration with the mutable usage bucket.
Capacity math stops at a single Redis memory estimate
Have you considered the full chain from the stated assumptions to infrastructure sizing? You estimated Redis memory for counters, but there is no back-of-envelope path from 50K RPS and 100K unique clients to expected reads/writes per request, network throughput, peak concurrency, or how many application instances are needed. At senior level, I would expect at least a rough DAU/client -> request rate -> Redis ops/sec -> memory/bandwidth chain.
Redis sizing is not justified against peak load
What happens when traffic spikes above the average 50K RPS or when rate limiting requires multiple Redis operations per request? Saying '2 Redis servers at 25K RPS each' is not enough to show the system is comfortable at this load, because there is no reasoning about per-node throughput, headroom, replication overhead, or whether the chosen algorithm needs 1, 2, or more commands per request. Without that, the node count feels arbitrary.
No capacity impact for runtime config changes
Have you considered what happens when admins change rate-limit logic or algorithm at runtime? That requirement can introduce config fanout, cache invalidation, and potentially a surge of misses or recomputation across the fleet. The capacity section does not estimate how often config is read, whether it is cached, or what load hits the backing store when rules change.
Memory estimate needs connection to algorithm choice
You could improve this by tying the 10 counters per user assumption to the actual rate-limiting algorithm and API shape. Different algorithms have very different storage footprints: fixed window may need one counter, sliding window log may need many timestamps, token bucket may need token state plus refill metadata. Right now the memory estimate is plausible, but it is not justified by the chosen approach.
Add headroom and failure-scenario sizing
You could strengthen this by asking: what happens if one Redis node is unavailable or traffic becomes uneven? With only '2 Redis servers at 25K RPS each,' there is no explanation of failover capacity. A stronger answer would show that the remaining capacity can absorb a node loss or that there is enough buffer to handle bursts without immediately saturating Redis.
Admin rule management covers runtime updates
The admin APIs include create, read, update, and delete for rate-limit rules, which directly supports the requirement that admins can change rate-limit logic at runtime.
Standard rate-limit headers are explicitly surfaced
The rateLimit response calls out X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, and Retry-After, which makes the client contract around throttling behavior clear and aligns with the functional requirement.
gRPC response mixes HTTP semantics without a clear contract
Have you considered what the client actually receives on the gRPC rateLimit call? The design says the RPC returns '200 ok or 429' plus headers, but in gRPC those are not modeled the same way as REST responses. Without defining whether throttling is represented as a normal response payload, gRPC status, or response metadata/trailers, different clients may handle denials inconsistently. You could improve this by defining an explicit protobuf response shape such as {allowed, limit, remaining, resetAt, retryAfterSec} and, if needed, mapping that to HTTP headers only at an API gateway boundary.
Core decision API is underspecified for rule selection
Have you considered how the rateLimit API determines which rule applies when multiple dimensions matter? The request only says clientId and requestInfo(url, endpoint etc), but the rule object includes method and path. Without a precise request schema for method, normalized path or route key, and possibly API identifier, the service contract is ambiguous and clients may send inconsistent values that lead to incorrect enforcement. You could improve this by defining exact request fields and matching semantics.
No clear error contract for admin APIs
What happens when an admin submits an invalid rule, references a missing rule-id, or tries to switch to an unsupported algorithm? The routes are listed, but there is no status-code or error-body contract. Without this, clients cannot reliably distinguish validation failures from transient server errors or know whether a retry is safe. You could improve this by specifying standard responses like 400 for invalid rule definitions, 404 for unknown rule IDs, 409 for conflicting updates, and a consistent error payload.
List/read APIs are incomplete for operational use
How does an admin discover existing rules or audit current configuration at runtime? You have GET /rules/{rule-id}, but no GET /rules collection endpoint. Since runtime rule management is a functional requirement, operators will likely need to list rules, filter by path or algorithm, and page through results if the rule set grows. You could improve this by adding GET /rules with pagination and filtering.
Resource naming is slightly inconsistent
Have you considered separating the resource identifier from the business key? The route uses /rules/{rule-id}, while the example object also contains ruleId='search-api' plus method/path fields. If ruleId is really the primary identifier, that is fine, but if rules are naturally keyed by method+path, the API should make that explicit. Tightening this contract would reduce ambiguity around updates and deletes.
Delete route appears malformed
The admin API lists 'DELET /rules/{rule-id}', which looks like a typo. If this is just notation, no issue, but in an API review I would push for precise verb definitions because clients and generated SDKs depend on them. You could improve this by explicitly defining DELETE semantics and expected responses such as 204 on success.
Hot path separates config from counters
Using etcd for rule distribution and Redis for request-time counter evaluation is a strong design choice. It keeps the hot path off the config store, matches the availability and latency goals, and shows good understanding that configuration reads and quota mutations have very different access patterns.
Atomic quota evaluation in Redis
Running the rate-limit algorithm through Redis Lua scripts is a good way to keep counter updates and allow/deny decisions atomic. That avoids race conditions at 50K RPS where multiple requests for the same client could otherwise overshoot the limit.
Runtime rule updates propagated by watch
The admin flow through Admin service -> etcd -> watcher updates in the rate limiter is a good fit for the requirement that admins can change logic at runtime. It avoids polling and keeps rule changes reasonably fresh across instances.
Redis appears to be the primary bottleneck and possible SPOF
What happens when one Redis node fails or gets overloaded? The design says two Redis servers serving ~25K RPS each, but it does not explain sharding, replication, or failover behavior. Without a clear partitioning and redundancy model, one hot shard or one node loss could either drop capacity sharply or make rate-limit decisions unavailable for part of the traffic.
Rule changes may become inconsistent across rate limiter instances
Have you considered what happens if one rate limiter instance misses an etcd watch event, restarts, or lags behind others during a rule update? Some instances could enforce old limits while others enforce new ones. For a runtime-configurable limiter, you would want a clear resync/versioning strategy so instances can detect stale local state and reload rules safely.
Client metadata lookup path is under-specified for the hot path
What happens when the rate limiter needs client properties like free/premium and the Client Metadata Cache misses? The diagram implies a read-through/write-through path to Postgres, but if request-time decisions fall back to Postgres, latency and availability could degrade quickly under load. This path needs a clear strategy for cache warmup, TTLs, and behavior on metadata-store failures.
In-memory cache is not tied to a concrete request flow
You could improve this by being explicit about what the local in-memory cache stores and how it is invalidated. Right now it seems intended for rules, but the flow does not show whether it is authoritative for rule lookup, whether it caches client metadata too, or how stale entries are handled after admin updates.
Fail-open and fail-close behavior is mentioned but not fully designed
Have you considered what happens when Redis is partially degraded, timing out, or returning intermittent errors? The design says the gateway or rate limiter can take the default call, but without clear timeout budgets and fallback ownership, requests may hang or different instances may make inconsistent decisions. At 99.99% availability, the failure path needs to be as explicit as the success path.
Admin write path lacks persistence story for recovery
You could improve this by clarifying the source of truth between etcd and Postgres for rules. The current flow writes rules to etcd, while Postgres is present but not clearly used for rule persistence. If etcd state is lost or rebuilt, the system needs a deterministic way to restore rule definitions and algorithm settings.
Draw your architecture for Rate Limiter and get an instant hire/no-hire signal from 6 specialized AI reviewers — free to start.