Skip to main content

Object, Block, and File Storage: When Each Is Right

What This Concept Is

Three storage shapes, three access patterns, three cost structures:

  • Object storage (S3, GCS, Azure Blob) - data is addressed by a key inside a bucket (s3://bucket/path/object.json). Accessed over HTTPS API, not mounted as a filesystem. Effectively infinite, 11 nines of durability, versioning and lifecycle policies built in. Typical cost ~$0.023 per GB-month (Standard tier).
  • Block storage (EBS, GCP Persistent Disk, Azure Managed Disks) - a virtual disk attached to a VM. Acts like a raw block device; the OS puts a filesystem on top. Sized in GB, with provisioned IOPS and throughput. Typical cost ~$0.08-0.125 per GB-month for gp3. AZ-scoped and typically attached to one instance at a time.
  • File storage (EFS, FSx, GCP Filestore, Azure Files) - a POSIX/NFS filesystem you can mount from many instances at once. Elastic size, regional or zonal. Costlier per GB (~$0.30 per GB-month for EFS Standard) but solves the "many readers, shared state" problem.

Choosing between them drives cost, performance, and architectural simplicity more than almost any other decision in the storage layer.

At the OS level the distinction is visible with mount and lsblk: a block device shows as /dev/nvme0n1, a file store as an NFS mount under /mnt/..., and object storage is not mounted at all - it is an API call. Treating object storage as a filesystem (with tools like s3fs) is almost always a mistake because the semantics differ in ways your application will eventually notice.

Why It Matters Here

Picking wrong gets expensive:

  • storing millions of small files on EBS instead of S3 wastes money and caps on instance size
  • storing a relational database data directory on S3 doesn't work at all
  • sharing a "data directory" between 20 EC2 instances via rsync instead of EFS creates brittle consistency bugs
  • writing logs directly to EBS instead of shipping them out means log volume shapes your disk sizing
  • cross-AZ EFS mounts are slow and metered - a "quick" fix can move a whole month's egress onto one line item

Later modules lean on this: backups (S3 usually), container volumes (EBS + EFS + PVCs), data lakes (S3), and databases (block or managed).

Concrete Example

Application: a SaaS product with user uploads, a PostgreSQL database, and a shared scratch directory used by a batch fleet.

  • User uploads -> S3. Objects keyed by tenant and upload ID. Presigned URLs let clients upload directly, bypassing the app. Lifecycle rule moves objects >90 days old from S3 Standard to S3 Standard-IA; >1 year to Glacier Deep Archive.
  • PostgreSQL data files -> EBS gp3 volume attached to the RDS instance (or to an EC2 if self-managed). IOPS and throughput provisioned separately from size. Snapshot backups stored in S3 under the hood.
  • Batch fleet shared scratch -> EFS mounted at /mnt/scratch from all workers. Elastic size, pay-per-use.

Concrete shell interactions:

# object
aws s3 cp ./report.csv s3://acme-uploads-prod/tenants/42/2026-04/report.csv
aws s3api put-object-tagging --bucket acme-uploads-prod --key ... --tagging 'TagSet=[{Key=data-classification,Value=confidential}]'

# block
lsblk # /dev/nvme1n1 = the gp3 volume
mkfs.xfs /dev/nvme1n1 # filesystem on top
mount /dev/nvme1n1 /var/lib/postgresql/data

# file
mount -t nfs4 fs-0abc.efs.us-east-1.amazonaws.com:/ /mnt/scratch
rsync -a --info=progress2 /mnt/scratch/ s3://acme-scratch-archive/ # batch archive

Costs to notice:

  • the 1 TB of uploads over 90 days old cost ~$12.50/month on Standard; on Standard-IA they cost ~$12.80 for storage but retrieval is billed per request and per GB. Lifecycle policies only save money if retrieval is rare.
  • a 200 GB gp3 EBS volume costs ~$16/month; snapshots (incremental) typically a fraction of that.
  • 100 GB of EFS Standard costs ~$30/month; if that fleet only needs read-mostly shared state, EFS Infrequent Access can cut this substantially.

Common Confusion / Misconception

"Object storage is just a weird filesystem." It is not a filesystem. There are no directories (only key prefixes), no partial updates (only whole-object PUT), and no rename (you copy then delete). Treat objects as immutable blobs with metadata.

"EBS is the fastest." gp3 gives 3000-16000 IOPS and 125-1000 MB/s. io2 Block Express can do much more but is expensive. Meanwhile, well-tuned S3 multipart uploads to many keys can saturate a 10 Gbps link. Pick on pattern, not a one-line benchmark.

"EFS is a drop-in NFS." It is NFS, but performance is shaped by throughput mode (Bursting vs Provisioned vs Elastic) and by your workload pattern. Many small synchronous writes from 40 workers can be surprisingly slow; test with your real workload.

"S3 has no concept of folders." Correct - but the console shows them, and many SDKs let you iterate ListObjectsV2 with a Prefix + Delimiter="/". The "folder" is a fiction the console paints on top of prefixed keys. Understanding that is a prerequisite for designing clean key schemas.

Gotchas:

  • S3 strong-read-after-write applies to new objects in the same region. Cross-region replication is eventual. Do not build write-then-read pipelines that read from a different region assuming immediate visibility.
  • EBS volumes can attach to only one EC2 at a time (unless you use the io2 multi-attach flavor, which your filesystem almost certainly does not support safely). Attempting to re-mount on another instance without detach leads to data corruption.
  • EFS mount targets are per-AZ. If your instance is in us-east-1c and your EFS has no mount target in 1c, mounts will go cross-AZ and inflate both latency and egress.

How To Use It

For each data element:

  1. Ask: is this many readers/writers, random-access, immutable blob, or structured row?
  2. Blob, maybe huge, accessed by key -> object storage.
  3. Filesystem semantics needed by one host -> block storage.
  4. Filesystem semantics needed by many hosts at once -> file storage.
  5. Price it at realistic scale: storage class + access pattern + retrieval + egress.
  6. Plan lifecycle (S3) or snapshot policy (EBS) from day one; do not let untagged "temp" data grow unbounded.
  7. Encrypt at rest on every tier (SSE-KMS for S3, EBS encryption, EFS encryption). Use a customer-managed KMS key for data you need to delete forever by revoking the key.
  8. Block public access on object buckets at the account level. Turning it off should require an ADR.

Check Yourself

  1. Why is S3 a bad place to put a PostgreSQL data directory?
  2. A log-heavy workload keeps maxing out a gp3 volume. What cheaper option might work, and what does it give up?
  3. Which of the three storage types is AZ-scoped, and what does that imply for failure design?
  4. What does it mean that "there is no rename in S3," and how does it change a batch job that rotates daily output files?
  5. Your EFS workload has high per-file latency but low throughput. Which knobs can you turn - and which require a new file system, not a mount option change?

Mini Drill or Application

For your own past project (or a hypothetical SaaS), list every data element (user files, logs, database, scratch, backups) and pick object/block/file for each in fifteen minutes. Estimate order-of-magnitude cost for each and note one lifecycle or snapshot rule per item.

Extension: design an S3 key schema for multi-tenant uploads that you can efficiently list per tenant and expire per tenant. If your schema makes "list all of tenant 42's objects" slow, start over - that one query shape drives every billing, compliance, and deletion story later.

Read This Only If Stuck