Skip to main content

Disks, Sectors, LBAs, and the Block Interface

What This Concept Is

The file system never sees "disks" the way marketing does. It sees a block device: a flat array of fixed-size blocks indexed by integer addresses called LBAs (Logical Block Addresses).

 LBA:   0      1      2      3      4    ...   N-1
+------+------+------+------+------+ +------+
| blk0 | blk1 | blk2 | blk3 | blk4 | ... | blkN |
+------+------+------+------+------+ +------+

block = 512 B or 4096 B (sector size), device-defined

Two operations:

  • read(LBA, count) -> bytes
  • write(LBA, bytes) -> ok

That is the entire contract at this layer. Everything else (rotational latency, seek time, wear leveling, SSD erase blocks, RAID striping) is hidden beneath it by the drive firmware and block layer.

Beneath the LBA abstraction sit very different physical devices:

  • HDD: spinning platters with heads. Sequential access is cheap, random access pays seek (4-10 ms) plus rotational latency (~4 ms at 7200 RPM). One random 4 KiB read can take 10 ms; sequential reads can hit 150 MiB/s.
  • SSD / NVMe: flash cells addressed through a Flash Translation Layer (FTL). No seek penalty. Random 4 KiB reads in ~50-100 µs; sequential reads in GiB/s on NVMe. But writes have asymmetric costs (erase-before-write at the erase-block level, garbage collection).

Why It Matters Here

The file system's layout decisions assume the LBA model. It allocates inodes and data blocks by number. A random "random" inode access may map to a very slow sector on HDD or a slightly slower one on SSD, depending on the drive's internal mapping.

File-system designers exploit physics through the abstraction:

  • FFS clusters related data in cylinder groups (HDD seek locality)
  • LFS writes only sequentially (HDD sequential is 100x faster than random)
  • SSD-aware file systems try to align writes to erase-block boundaries and minimize write amplification

Understanding the block interface also tells you why fsync exists: user-space writes land in a cache. To force a real LBA write, you need the kernel to flush and the drive to actually commit rather than caching in its own DRAM.

Concrete Example

A simple disk (OSTEP's model): platter rotating at 7200 RPM, 12 ms average seek, 512-byte sectors.

  • Read one random sector:
    • seek time: ~4 ms
    • rotational latency: avg half a revolution = 4.17 ms
    • transfer: 512 B / (~100 MB/s) ~= 5 us
    • total: ~8 ms
  • Read 1 MiB sequential (2048 sectors): one seek, then continuous: ~4 ms + 10 ms transfer = 14 ms
  • Read 1 MiB of random 4 KiB blocks: 256 seeks at ~8 ms = ~2048 ms = 2 seconds

Same 1 MiB of user data; two orders of magnitude difference depending on pattern.

Common Confusion / Misconception

"SSDs are fast, so random I/O is free." No. SSDs narrow the gap but do not eliminate it. Random 4 KiB reads on NVMe are perhaps 5-10x slower than sequential 4 KiB reads, and random writes are worse because of write amplification.

"write(fd, buf, 4096) writes to sector X." No. It writes to the page cache. The block layer later maps dirty pages to LBAs and issues them as merged BIOs. The LBA chosen depends on the file's inode and indirect-block state.

"Block = sector." Historically sector was 512 B. Modern drives expose 512 B "logical" sectors but use 4096 B "physical" sectors internally, and most Linux file systems use 4096 B blocks to match page size.

How To Use It

When asked about an I/O workload's cost, translate to blocks:

  1. How many blocks are read/written?
  2. Are they contiguous (one request) or scattered (one request per block)?
  3. On HDD, how many seeks? On SSD, how much write amplification?
  4. Can the kernel merge adjacent requests? (The elevator / merge queue does this.)

Answering these questions quickly is the operational skill of Cluster 4.

Check Yourself

  1. If an HDD does 100 random IOPS and an SSD does 100,000, how much faster is the SSD, and why is the number not 10^6?
  2. Why does reading 64 KiB sequentially cost roughly the same as reading 4 KiB on HDD?
  3. What does a drive's "write cache" do, and why does fsync fail to guarantee durability if you do not also send a FLUSH?

Mini Drill or Application

Compute total time on a 7200 RPM HDD with 5 ms average seek, 100 MB/s sustained:

  1. Read 1 GiB sequential from one file.
  2. Read 1 GiB as 4 KiB random reads.
  3. Read 1 GiB as 4 KiB reads spaced 1 MiB apart on the same track.

Which dominates in each case: seek, rotation, or transfer? Redo (2) on an NVMe SSD doing 500 kIOPS 4 KiB random.

Read This Only If Stuck