🧩Core Building Blocks·7 min read

SQL vs NoSQL Databases

Relational vs non-relational stores — what each is good at and how to choose.

Almost every backend eventually faces the same question: should this data live in a SQL relational database or in a NoSQLstore? The answer is not about which technology is newer. It is about the shape of the data, the queries you must support, the correctness guarantees the product needs, and the failure modes you can tolerate.

🔭Think of it like…

A SQL database is a city hall records office: every form has required fields, clerks check references, and changing ownership of a house updates several ledgers as one official transaction. NoSQL stores are specialized workshops: a key cabinet for instant lookup, a JSON folder room for flexible documents, a wide time-series shelf for huge append-only writes, or a map of relationships for graph traversal.

The problem: one word hides many trade-offs

"Database" sounds like a single product category, but system design interviews and real architectures care about behavior under pressure. A shopping cart, a bank ledger, an activity feed, a feature flag lookup, and a social graph all store data, but they fail in different ways.

If a payment debit commits but the matching credit does not, users lose money. That is a correctness failure.
If a feed write path cannot absorb millions of likes per minute, the product falls behind. That is a throughput failure.
If a profile document changes shape every sprint and every migration is painful, the team slows down. That is a schema evolutionfailure.
If a query needs to jump across friends-of-friends and you model it as thousands of joins or API calls, latency explodes. That is anaccess pattern failure.

Start from the access pattern

Do not choose SQL or NoSQL from a brand name. Write down the reads, writes, invariants, expected data size, and growth path. The right store is the one whose native model makes your most important operations simple and safe.

SQL: the relational model, normalization, joins, and ACID

A relational database stores facts in tables. Each row has a stable shape, columns have types, and constraints enforce rules before bad data lands. Relationships are represented with keys: a customer row has an id, an order row points at that id, and order items point at the order.

normalized relational model

customers(id, email, created_at)
orders(id, customer_id, status, total_cents)
order_items(id, order_id, sku, quantity, unit_price_cents)

-- The database can enforce:
-- orders.customer_id must reference customers.id
-- order_items.order_id must reference orders.id
-- quantity must be positive

Normalization reduces duplication

Normalization means storing each fact once and referring to it by key. Instead of copying a customer email onto every order item, you keep it in customers. This prevents update anomalies: changing an email in one place cannot leave half the old values behind.

Joins reconstruct the view you need

Because normalized data is split across tables, SQL gives youjoins to combine rows at query time. The database optimizer chooses indexes and join algorithms so application code does not manually stitch many requests together.

join normalized tables into an order summary

SELECT o.id, c.email, SUM(oi.quantity * oi.unit_price_cents) AS subtotal
FROM orders o
JOIN customers c ON c.id = o.customer_id
JOIN order_items oi ON oi.order_id = o.id
WHERE o.id = 123
GROUP BY o.id, c.email;

ACID protects invariants

Atomicity: all changes in a transaction commit, or none do. A failed checkout does not leave inventory reserved without an order.
Consistency: constraints and transaction logic move the database from one valid state to another.
Isolation: concurrent transactions do not accidentally see half-finished work in ways that violate your chosen isolation level.
Durability: once committed, the write survives crashes via logs and storage replication.

Common SQL examples

PostgreSQL, MySQL, SQL Server, Oracle, and SQLite are relational systems. They are a strong default for accounts, orders, payments, inventory, permissions, subscriptions, and other data with important relationships.

NoSQL is a family, not one database

NoSQLmeans "not only SQL". The useful mental model is not SQL versus one alternative; it is SQL versus several specialized data models. Each family optimizes a different access path and usually gives up some combination of joins, global transactions, or immediate consistency.

Family	Data model	Examples	Great for	Watch out for
Key-value	Opaque value at a key	Redis, DynamoDB, Riak	Sessions, feature flags, carts, counters, cache entries	Queries by anything except the key are awkward or impossible
Document	JSON-like document with nested fields	MongoDB, Couchbase, Firestore	Profiles, catalogs, CMS content, product metadata	Duplicated embedded data can drift; joins are limited
Wide-column	Rows partitioned by key with sparse columns	Cassandra, HBase, Bigtable, ScyllaDB	Time-series, event writes, IoT, large append-heavy workloads	You must design tables around queries up front
Graph	Nodes and edges with properties	Neo4j, JanusGraph, Amazon Neptune	Social graph, fraud rings, recommendations, dependency graphs	Not a general replacement for OLTP rows or cheap bulk scans

Document stores trade joins for locality

A document store often embeds related data together so one read returns the whole aggregate. For a product catalog, keeping variants and marketing copy inside one product document can be convenient. For payments, copying account balances into many documents would be dangerous.

document-shaped product

{
  "_id": "sku_123",
  "name": "Trail Shoe",
  "variants": [
    { "color": "red", "sizes": [8, 9, 10] },
    { "color": "black", "sizes": [7, 8, 9, 10] }
  ],
  "searchTags": ["running", "outdoor"]
}

Wide-column stores start with the query

Cassandra-style modeling asks, "What exact query must be fast?" You may create one table for messages by conversation and another for messages by sender. This duplication is intentional: the write path updates the query-shaped tables so reads avoid expensive joins.

BASE, schema flexibility, and consistency trade-offs

Many NoSQL systems are described as BASE: basically available, soft state, eventually consistent. That does not mean "inconsistent and broken". It means the system often accepts writes and serves reads during partitions or failures, then converges through replication, conflict resolution, or last-write-wins rules.

Idea	SQL tendency	NoSQL tendency
Schema	Declared and enforced before writes	Flexible, often enforced by application code
Consistency	Strong within a primary transaction	Often tunable or eventually consistent
Transactions	Multi-row and multi-table are common	Often per item, partition, or document
Data modeling	Normalize first, join later	Denormalize for known reads
Failure posture	Prefer correctness and reject unsafe writes	Prefer availability for selected access patterns

Schemaless does not mean designless

Flexible schema moves responsibility from the database to your code. You still need versioned documents, validation, backfills, and compatibility logic so old and new application versions can read the same collection.

How each option scales

SQL databases traditionally scale up first: bigger CPU, more RAM, faster disks, better indexes, connection pooling, and read replicas. They can also shard, but cross-shard joins and transactions become hard, so teams often delay sharding until a single primary is truly the bottleneck.

NoSQL systems are often built around horizontal partitioning from day one. A key-value store hashes keys across nodes; Cassandra partitions by a partition key; Kafka-like logs partition by topic key. This makes write throughput and storage scale linearly when the access pattern fits, but it makes arbitrary ad hoc queries harder.

same product, different scaling pressure

Relational default:
  users, orders, payments on Postgres
  add indexes, read replicas, then shard only if needed

NoSQL scale path:
  session:{token} in Redis/DynamoDB
  events_by_user:{user_id, timestamp} in Cassandra
  product:{sku} in MongoDB

Each store is chosen for a specific query, not because one is universally best.

Default that works surprisingly often

Start with a relational database such as Postgres when requirements are still changing. Add a specialized store when a concrete access pattern has outgrown the relational model or needs latency/scale that a relational primary cannot comfortably deliver.

Choosing and combining stores

Mature systems often use polyglot persistence: more than one storage technology in the same architecture. The source of truth might be Postgres, while Redis caches hot reads, Elasticsearch powers search, Cassandra stores high-volume events, and Neo4j answers graph questions.

Pick SQL when relationships, constraints, ad hoc querying, and multi-row correctness are central to the product.
Pick key-value when the operation is "given this key, fetch or update this value" at very low latency or massive scale.
Pick document when the aggregate naturally fits in one JSON-like record and the schema changes frequently.
Pick wide-column when the workload is huge, write-heavy, and queryable by a known partition key plus sort key.
Pick graph when the question is about traversing relationships many hops deep.

Related lessons

The choice interacts with replication, sharding, and consistency models. Storage is never isolated from scaling and correctness.

Key takeaways

SQL databases model related facts in tables, normalize duplicated data, use joins to reconstruct views, and protect invariants with ACID transactions.
NoSQL is a family: key-value, document, wide-column, and graph stores each optimize a different data shape and access pattern.
BASE and eventual consistency trade immediate global agreement for availability and horizontal scale; that trade must match the product failure mode.
Flexible schema helps teams evolve documents quickly, but validation, versioning, and backfills still matter.
Use polyglot persistence deliberately: keep a clear source of truth and add specialized stores for proven access patterns, not fashion.

Early products change requirements constantly. A relational database gives strong constraints, transactions, indexes, and flexible ad hoc queries while you are still learning the domain. You can add specialized stores later when one workload has a measurable scaling or latency problem.

It lets you keep a whole aggregate, such as a product profile, in one flexible JSON-like record, so reads are local and schema changes are easy. The cost is weaker relational enforcement: duplicated embedded data can drift, joins are limited, and application code must handle document versions.

It is using multiple storage systems for different jobs in one architecture: for example Postgres as the source of truth, Redis for hot cache entries, Cassandra for append-heavy events, and a graph store for relationship traversal. The key is keeping ownership and synchronization boundaries clear.

Finished this lesson?

Mark it complete to track your progress through the workbook.