SQL vs NoSQL Databases
Relational vs non-relational stores — what each is good at and how to choose.
Almost every backend eventually faces the same question: should this data live in a SQL relational database or in a NoSQLstore? The answer is not about which technology is newer. It is about the shape of the data, the queries you must support, the correctness guarantees the product needs, and the failure modes you can tolerate.
The problem: one word hides many trade-offs
"Database" sounds like a single product category, but system design interviews and real architectures care about behavior under pressure. A shopping cart, a bank ledger, an activity feed, a feature flag lookup, and a social graph all store data, but they fail in different ways.
- If a payment debit commits but the matching credit does not, users lose money. That is a correctness failure.
- If a feed write path cannot absorb millions of likes per minute, the product falls behind. That is a throughput failure.
- If a profile document changes shape every sprint and every migration is painful, the team slows down. That is a schema evolutionfailure.
- If a query needs to jump across friends-of-friends and you model it as thousands of joins or API calls, latency explodes. That is anaccess pattern failure.
SQL: the relational model, normalization, joins, and ACID
A relational database stores facts in tables. Each row has a stable shape, columns have types, and constraints enforce rules before bad data lands. Relationships are represented with keys: a customer row has an id, an order row points at that id, and order items point at the order.
customers(id, email, created_at)
orders(id, customer_id, status, total_cents)
order_items(id, order_id, sku, quantity, unit_price_cents)
-- The database can enforce:
-- orders.customer_id must reference customers.id
-- order_items.order_id must reference orders.id
-- quantity must be positiveNormalization reduces duplication
Normalization means storing each fact once and referring to it by key. Instead of copying a customer email onto every order item, you keep it in customers. This prevents update anomalies: changing an email in one place cannot leave half the old values behind.
Joins reconstruct the view you need
Because normalized data is split across tables, SQL gives youjoins to combine rows at query time. The database optimizer chooses indexes and join algorithms so application code does not manually stitch many requests together.
SELECT o.id, c.email, SUM(oi.quantity * oi.unit_price_cents) AS subtotal
FROM orders o
JOIN customers c ON c.id = o.customer_id
JOIN order_items oi ON oi.order_id = o.id
WHERE o.id = 123
GROUP BY o.id, c.email;ACID protects invariants
- Atomicity: all changes in a transaction commit, or none do. A failed checkout does not leave inventory reserved without an order.
- Consistency: constraints and transaction logic move the database from one valid state to another.
- Isolation: concurrent transactions do not accidentally see half-finished work in ways that violate your chosen isolation level.
- Durability: once committed, the write survives crashes via logs and storage replication.
NoSQL is a family, not one database
NoSQLmeans "not only SQL". The useful mental model is not SQL versus one alternative; it is SQL versus several specialized data models. Each family optimizes a different access path and usually gives up some combination of joins, global transactions, or immediate consistency.
| Family | Data model | Examples | Great for | Watch out for |
|---|---|---|---|---|
| Key-value | Opaque value at a key | Redis, DynamoDB, Riak | Sessions, feature flags, carts, counters, cache entries | Queries by anything except the key are awkward or impossible |
| Document | JSON-like document with nested fields | MongoDB, Couchbase, Firestore | Profiles, catalogs, CMS content, product metadata | Duplicated embedded data can drift; joins are limited |
| Wide-column | Rows partitioned by key with sparse columns | Cassandra, HBase, Bigtable, ScyllaDB | Time-series, event writes, IoT, large append-heavy workloads | You must design tables around queries up front |
| Graph | Nodes and edges with properties | Neo4j, JanusGraph, Amazon Neptune | Social graph, fraud rings, recommendations, dependency graphs | Not a general replacement for OLTP rows or cheap bulk scans |
Document stores trade joins for locality
A document store often embeds related data together so one read returns the whole aggregate. For a product catalog, keeping variants and marketing copy inside one product document can be convenient. For payments, copying account balances into many documents would be dangerous.
{
"_id": "sku_123",
"name": "Trail Shoe",
"variants": [
{ "color": "red", "sizes": [8, 9, 10] },
{ "color": "black", "sizes": [7, 8, 9, 10] }
],
"searchTags": ["running", "outdoor"]
}Wide-column stores start with the query
Cassandra-style modeling asks, "What exact query must be fast?" You may create one table for messages by conversation and another for messages by sender. This duplication is intentional: the write path updates the query-shaped tables so reads avoid expensive joins.
BASE, schema flexibility, and consistency trade-offs
Many NoSQL systems are described as BASE: basically available, soft state, eventually consistent. That does not mean "inconsistent and broken". It means the system often accepts writes and serves reads during partitions or failures, then converges through replication, conflict resolution, or last-write-wins rules.
| Idea | SQL tendency | NoSQL tendency |
|---|---|---|
| Schema | Declared and enforced before writes | Flexible, often enforced by application code |
| Consistency | Strong within a primary transaction | Often tunable or eventually consistent |
| Transactions | Multi-row and multi-table are common | Often per item, partition, or document |
| Data modeling | Normalize first, join later | Denormalize for known reads |
| Failure posture | Prefer correctness and reject unsafe writes | Prefer availability for selected access patterns |
How each option scales
SQL databases traditionally scale up first: bigger CPU, more RAM, faster disks, better indexes, connection pooling, and read replicas. They can also shard, but cross-shard joins and transactions become hard, so teams often delay sharding until a single primary is truly the bottleneck.
NoSQL systems are often built around horizontal partitioning from day one. A key-value store hashes keys across nodes; Cassandra partitions by a partition key; Kafka-like logs partition by topic key. This makes write throughput and storage scale linearly when the access pattern fits, but it makes arbitrary ad hoc queries harder.
Relational default:
users, orders, payments on Postgres
add indexes, read replicas, then shard only if needed
NoSQL scale path:
session:{token} in Redis/DynamoDB
events_by_user:{user_id, timestamp} in Cassandra
product:{sku} in MongoDB
Each store is chosen for a specific query, not because one is universally best.Choosing and combining stores
Mature systems often use polyglot persistence: more than one storage technology in the same architecture. The source of truth might be Postgres, while Redis caches hot reads, Elasticsearch powers search, Cassandra stores high-volume events, and Neo4j answers graph questions.
- Pick SQL when relationships, constraints, ad hoc querying, and multi-row correctness are central to the product.
- Pick key-value when the operation is "given this key, fetch or update this value" at very low latency or massive scale.
- Pick document when the aggregate naturally fits in one JSON-like record and the schema changes frequently.
- Pick wide-column when the workload is huge, write-heavy, and queryable by a known partition key plus sort key.
- Pick graph when the question is about traversing relationships many hops deep.
- SQL databases model related facts in tables, normalize duplicated data, use joins to reconstruct views, and protect invariants with ACID transactions.
- NoSQL is a family: key-value, document, wide-column, and graph stores each optimize a different data shape and access pattern.
- BASE and eventual consistency trade immediate global agreement for availability and horizontal scale; that trade must match the product failure mode.
- Flexible schema helps teams evolve documents quickly, but validation, versioning, and backfills still matter.
- Use polyglot persistence deliberately: keep a clear source of truth and add specialized stores for proven access patterns, not fashion.
Mark it complete to track your progress through the workbook.