Object, Block, and File Storage: When Each Is Right
What This Concept Is
Three storage shapes, three access patterns, three cost structures:
- Object storage (S3, GCS, Azure Blob) - data is addressed by a key inside a bucket (
s3://bucket/path/object.json). Accessed over HTTPS API, not mounted as a filesystem. Effectively infinite, 11 nines of durability, versioning and lifecycle policies built in. Typical cost ~$0.023 per GB-month (Standard tier). - Block storage (EBS, GCP Persistent Disk, Azure Managed Disks) - a virtual disk attached to a VM. Acts like a raw block device; the OS puts a filesystem on top. Sized in GB, with provisioned IOPS and throughput. Typical cost ~$0.08-0.125 per GB-month for
gp3. AZ-scoped and typically attached to one instance at a time. - File storage (EFS, FSx, GCP Filestore, Azure Files) - a POSIX/NFS filesystem you can mount from many instances at once. Elastic size, regional or zonal. Costlier per GB (~$0.30 per GB-month for EFS Standard) but solves the "many readers, shared state" problem.
Choosing between them drives cost, performance, and architectural simplicity more than almost any other decision in the storage layer.
At the OS level the distinction is visible with mount and lsblk: a block device shows as /dev/nvme0n1, a file store as an NFS mount under /mnt/..., and object storage is not mounted at all - it is an API call. Treating object storage as a filesystem (with tools like s3fs) is almost always a mistake because the semantics differ in ways your application will eventually notice.
Why It Matters Here
Picking wrong gets expensive:
- storing millions of small files on EBS instead of S3 wastes money and caps on instance size
- storing a relational database data directory on S3 doesn't work at all
- sharing a "data directory" between 20 EC2 instances via rsync instead of EFS creates brittle consistency bugs
- writing logs directly to EBS instead of shipping them out means log volume shapes your disk sizing
- cross-AZ EFS mounts are slow and metered - a "quick" fix can move a whole month's egress onto one line item
Later modules lean on this: backups (S3 usually), container volumes (EBS + EFS + PVCs), data lakes (S3), and databases (block or managed).
Concrete Example
Application: a SaaS product with user uploads, a PostgreSQL database, and a shared scratch directory used by a batch fleet.
- User uploads -> S3. Objects keyed by tenant and upload ID. Presigned URLs let clients upload directly, bypassing the app. Lifecycle rule moves objects >90 days old from
S3 StandardtoS3 Standard-IA; >1 year to Glacier Deep Archive. - PostgreSQL data files -> EBS
gp3volume attached to the RDS instance (or to an EC2 if self-managed). IOPS and throughput provisioned separately from size. Snapshot backups stored in S3 under the hood. - Batch fleet shared scratch -> EFS mounted at
/mnt/scratchfrom all workers. Elastic size, pay-per-use.
Concrete shell interactions:
# object
aws s3 cp ./report.csv s3://acme-uploads-prod/tenants/42/2026-04/report.csv
aws s3api put-object-tagging --bucket acme-uploads-prod --key ... --tagging 'TagSet=[{Key=data-classification,Value=confidential}]'
# block
lsblk # /dev/nvme1n1 = the gp3 volume
mkfs.xfs /dev/nvme1n1 # filesystem on top
mount /dev/nvme1n1 /var/lib/postgresql/data
# file
mount -t nfs4 fs-0abc.efs.us-east-1.amazonaws.com:/ /mnt/scratch
rsync -a --info=progress2 /mnt/scratch/ s3://acme-scratch-archive/ # batch archive
Costs to notice:
- the 1 TB of uploads over 90 days old cost ~$12.50/month on Standard; on Standard-IA they cost ~$12.80 for storage but retrieval is billed per request and per GB. Lifecycle policies only save money if retrieval is rare.
- a 200 GB
gp3EBS volume costs ~$16/month; snapshots (incremental) typically a fraction of that. - 100 GB of EFS Standard costs ~$30/month; if that fleet only needs read-mostly shared state, EFS Infrequent Access can cut this substantially.
Common Confusion / Misconception
"Object storage is just a weird filesystem." It is not a filesystem. There are no directories (only key prefixes), no partial updates (only whole-object PUT), and no rename (you copy then delete). Treat objects as immutable blobs with metadata.
"EBS is the fastest." gp3 gives 3000-16000 IOPS and 125-1000 MB/s. io2 Block Express can do much more but is expensive. Meanwhile, well-tuned S3 multipart uploads to many keys can saturate a 10 Gbps link. Pick on pattern, not a one-line benchmark.
"EFS is a drop-in NFS." It is NFS, but performance is shaped by throughput mode (Bursting vs Provisioned vs Elastic) and by your workload pattern. Many small synchronous writes from 40 workers can be surprisingly slow; test with your real workload.
"S3 has no concept of folders." Correct - but the console shows them, and many SDKs let you iterate ListObjectsV2 with a Prefix + Delimiter="/". The "folder" is a fiction the console paints on top of prefixed keys. Understanding that is a prerequisite for designing clean key schemas.
Gotchas:
- S3 strong-read-after-write applies to new objects in the same region. Cross-region replication is eventual. Do not build write-then-read pipelines that read from a different region assuming immediate visibility.
- EBS volumes can attach to only one EC2 at a time (unless you use the
io2multi-attach flavor, which your filesystem almost certainly does not support safely). Attempting to re-mount on another instance without detach leads to data corruption. - EFS mount targets are per-AZ. If your instance is in
us-east-1cand your EFS has no mount target in1c, mounts will go cross-AZ and inflate both latency and egress.
How To Use It
For each data element:
- Ask: is this many readers/writers, random-access, immutable blob, or structured row?
- Blob, maybe huge, accessed by key -> object storage.
- Filesystem semantics needed by one host -> block storage.
- Filesystem semantics needed by many hosts at once -> file storage.
- Price it at realistic scale: storage class + access pattern + retrieval + egress.
- Plan lifecycle (S3) or snapshot policy (EBS) from day one; do not let untagged "temp" data grow unbounded.
- Encrypt at rest on every tier (SSE-KMS for S3, EBS encryption, EFS encryption). Use a customer-managed KMS key for data you need to delete forever by revoking the key.
- Block public access on object buckets at the account level. Turning it off should require an ADR.
Check Yourself
- Why is S3 a bad place to put a PostgreSQL data directory?
- A log-heavy workload keeps maxing out a
gp3volume. What cheaper option might work, and what does it give up? - Which of the three storage types is AZ-scoped, and what does that imply for failure design?
- What does it mean that "there is no rename in S3," and how does it change a batch job that rotates daily output files?
- Your EFS workload has high per-file latency but low throughput. Which knobs can you turn - and which require a new file system, not a mount option change?
Mini Drill or Application
For your own past project (or a hypothetical SaaS), list every data element (user files, logs, database, scratch, backups) and pick object/block/file for each in fifteen minutes. Estimate order-of-magnitude cost for each and note one lifecycle or snapshot rule per item.
Extension: design an S3 key schema for multi-tenant uploads that you can efficiently list per tenant and expire per tenant. If your schema makes "list all of tenant 42's objects" slow, start over - that one query shape drives every billing, compliance, and deletion story later.
Read This Only If Stuck
- AWS decision guide: Choosing an AWS storage service - authoritative decision tree
- AWS S3: Introduction - object model, durability, storage classes
- AWS EBS: Amazon EBS volumes - volume types, IOPS model, AZ scoping
- AWS EFS: When to choose Amazon EFS - EFS vs S3 vs EBS tradeoffs in one page
- Google Cloud: Cloud Storage overview - the GCS equivalent of S3
- Azure: Introduction to Blob storage - Azure's object-store primitives
- Linux Command Line: Mounting and unmounting storage devices - how a mounted block or NFS device actually shows up to userspace
- Linux Command Line: Synchronizing files and rsync over a network - the canonical shell-side tool for moving data between storage tiers