Object / Blob Storage
Where big files live — images, videos, backups — and why not in your database.
Object storage (also called blob storage) is the standard home for large unstructured files: images, videos, PDFs, backups, logs, exports, and machine-learning datasets. You store a whole object by key, attach metadata, and let a storage service such as S3, Azure Blob, or Google Cloud Storage handle durability, scale, and cheap capacity.
The problem: databases and disks are the wrong home for blobs
The failure mode object storage prevents is letting large binary files pollute systems optimized for different work. Put a 500 MB video in a database row and suddenly backups, replication, query memory, and cache eviction all pay for bytes they rarely need. Put user uploads on one VM disk and a deploy, disk failure, or autoscaling event can lose or strand data.
- Database bloat: large blobs make snapshots, replication, vacuuming, and restores slower and more expensive.
- Server coupling: local disks tie files to specific machines, which conflicts with stateless app servers and autoscaling.
- Poor data path: app servers become bandwidth relays when clients upload or download gigabytes through your API.
The object model: buckets, keys, metadata, and flat namespaces
Object stores expose a simple model: a bucket/container holds objects, and each object is addressed by a key. Keys often look like paths, such as users/42/avatar.png, but most object stores are fundamentally a flat namespace. The slash is just a character in the key that tools use to display pseudo-folders.
bucket: drawlint-prod-uploads
key: users/42/avatar/original.png
bytes: <binary stream>
metadata:
content-type: image/png
content-length: 184203
etag: "9b2cf535f27731c974343645a3985328"
x-amz-meta-owner-id: "42"
x-amz-meta-upload-id: "upl_123"Why the key design matters
Keys are part naming scheme, part operational interface. A good key includes tenancy, ownership, object purpose, and sometimes a random or content-hash component. Avoid using only user-provided filenames; they collide, leak information, and contain awkward characters.
files
- id: file_01HZ...
- owner_user_id: 42
- bucket: drawlint-prod-uploads
- object_key: users/42/uploads/file_01HZ/original
- status: pending | ready | deleted
- content_type: image/png
- size_bytes: 184203
- checksum_sha256: ...
- created_at: ...Object vs. block vs. file storage
Storage systems differ by the interface they expose. Object storage is not a mounted disk and not a POSIX filesystem. You fetch or replace objects through APIs, usually over HTTP. That trade-off is what makes it cheap, durable, and effectively bottomless.
| Type | Interface | Best for | Not ideal for |
|---|---|---|---|
| Object storage | GET/PUT/DELETE whole objects by key | Blobs, backups, static assets, logs, data lakes | Low-latency random writes inside one file |
| Block storage | Raw disk blocks attached to a VM | Databases, boot volumes, filesystems | Global sharing or direct browser access |
| File storage | Hierarchical filesystem with directories and POSIX-like operations | Shared app files, lift-and-shift workloads | Massive public object distribution at CDN scale |
read block 1842, use block storage. If it wants open /reports/q2.csv, use file storage. If it wants GET reports/q2.csv and serve it to users or pipelines, use object storage.Durability: replication, erasure coding, and eleven nines
Object storage is famous for durability claims like eleven nines (99.999999999%). That does not mean every request always succeeds; it means the service is designed so stored objects are extraordinarily unlikely to be permanently lost. Providers achieve this by storing redundant pieces across disks, racks, and often availability zones.
- Replication: store multiple full copies of an object in different failure domains. Simple and fast, but uses more raw storage.
- Erasure coding: split data into fragments plus parity so the system can reconstruct the object even if some fragments are lost. It is more space-efficient for large objects.
- Checksums and scrubbing: services continuously verify stored bytes and repair corrupt or missing fragments from redundancy.
A durable object store can still return:
- 503 SlowDown during a traffic spike
- 404 immediately after you asked for the wrong key
- 403 when a policy blocks access
Durability asks: after successful PUT, will the service still have the bytes years later?Storage classes, versioning, and lifecycle rules
Object stores let you trade retrieval speed for cost withstorage classes. Hot objects stay in standard storage. Rarely accessed objects move to infrequent-access classes. Archival data moves to glacier-like tiers where retrieval may take minutes or hours.
| Capability | What it does | Example use |
|---|---|---|
| Storage classes | Price/performance tiers for hot, cool, and archive data | Move old exports to archive after 90 days |
| Lifecycle rules | Automatic transitions and deletions by age, prefix, or tag | Delete temporary uploads after 24 hours |
| Versioning | Keep prior versions when a key is overwritten or deleted | Recover a user file after accidental overwrite |
| Object lock / retention | Prevent deletion until a retention period expires | Compliance archives and audit logs |
For keys under tmp/uploads/:
- abort incomplete multipart uploads after 1 day
- delete pending objects after 7 days
For keys under exports/:
- move to infrequent access after 30 days
- move to archive after 180 days
- delete after 7 yearsLifecycle rules are also a cost-control safety net. Incomplete multipart uploads, abandoned temporary files, and old generated thumbnails can silently accumulate unless the bucket has automatic cleanup.
Consistency, access patterns, and direct transfer
Modern major object stores generally provide strong read-after-write consistency for new writes, overwrites, and deletes in a region: after a successful PUT, a later GET or list should see it. Still, applications must design for distributed-system realities: retries, duplicate events, multipart completion, CDN staleness, and cross-region replication lag.
- Whole-object writes: object stores are not built for editing the middle of a file in place. Write a new object or new version.
- Large transfers: use multipart upload for parallelism, retries, and resume.
- Direct client access: use short-lived presigned URLs so browsers upload/download bytes directly instead of through app servers.
- CDN reads: serve popular public or signed private content through a CDN to keep object-store reads close to users.
1. POST /uploads -> app creates DB row: status = "pending"
2. app returns presigned PUT for one object key
3. browser PUTs bytes directly to object storage
4. app verifies object size/checksum and marks row "ready"- Object storage stores whole blobs by key with metadata; your database should store references and business metadata, not the bytes themselves.
- Object, block, and file storage expose different interfaces: HTTP object APIs, raw disk blocks, and filesystem paths.
- High durability comes from replication, erasure coding, checksums, repair, and multiple failure domains; it is different from availability or backup.
- Storage classes, lifecycle rules, versioning, and retention policies control cost, recovery, and compliance over an object's lifetime.
- Use multipart uploads, presigned URLs, and CDNs so large files move efficiently without turning app servers into bandwidth relays.
users/42/avatar.png may be displayed as folders by consoles or SDKs, but the slashes are part of the key rather than true directories with filesystem semantics.Mark it complete to track your progress through the workbook.