Superblocks, Inodes, Data Blocks: Typical FS Layout
What This Concept Is
A Unix-style file system partitions its block device into a small number of structural regions. The canonical ext-style layout looks like this:
LBA -> 0 1 K K+I K+I+B
+------+---------+------------+----------+----------------------+
| S | bitmaps | inode | data | data blocks |
| | (inode, | table | bitmap | (file contents, |
| | data) | | | directory data) |
+------+---------+------------+----------+----------------------+
super- block- inode data the rest
block allocators array blocks
Four kinds of structure live on disk:
- Superblock (
S): a fixed-location block at a known LBA that describes the entire FS. Contains: magic number, total block count, block size, inode count, pointer to the root inode, free counts, mount state. Read first at mount time. Usually replicated for resilience. - Bitmaps: two bitmaps, one per allocator. The inode bitmap marks which inodes are in use. The data bitmap marks which data blocks are in use. Allocation is "find a zero, flip it to one, and write the corresponding structure."
- Inode table: a dense array of fixed-size inode records, indexed by inode number. Inode
Nlives at a computable offset:inode_table_start + N * sizeof(inode). Each inode has direct block pointers, indirect / doubly / triply indirect pointers (ext-style) or extents (ext4, xfs, btrfs). - Data blocks: the rest of the disk. Holds file contents and directory data.
Large file systems are further divided into block groups (ext2/3/4) or allocation groups (xfs). Each group is a mini FS with its own bitmaps and inode table, so related files can be kept together to reduce seek distance on HDD.
Why It Matters Here
Every file operation navigates this layout. Trace read(fd, buf, 4096):
- From
fd, look up the open-file entry, then the inode number. - Read the inode from the inode table.
- From the inode's direct pointers (or extent tree), find the LBA of block
0. - Read that LBA into the page cache.
- Copy 4096 bytes into user space.
The same trace for a write to a new offset touches: inode table (update size and block pointer), data bitmap (mark block used), and data region (write data). Three separate on-disk updates is why Cluster 3 (crash consistency) exists.
Concrete Example
Create a 12 KiB file with 4 KiB blocks on ext2-style layout:
- Allocate inode: scan inode bitmap, flip bit, get inode number
I. - Allocate 3 data blocks: scan data bitmap three times, flip bits, get block numbers
B1, B2, B3(ideally contiguous). - Initialize inode
I: size= 12288, mode, timestamps, direct pointers[B1, B2, B3, 0, ...]. - Write inode to inode table at
inode_table_start + I * sizeof(inode). - Write directory entry
"foo" -> Ito the parent directory.
Five distinct block writes across bitmap, inode, data, and directory regions. The kernel batches them in cache; crash consistency (Cluster 3) decides in what order they eventually hit disk.
Common Confusion / Misconception
"Directories live outside the layout." No. A directory is a file. Its inode lives in the inode table; its data (the entry list) lives in data blocks; its name lives in its parent's data block.
"Files always use direct pointers." No. An ext2 inode has 12 direct pointers to 4 KiB blocks (48 KiB max), then an indirect pointer (pointing to a block of 1024 pointers = 4 MiB more), a double-indirect (4 GiB), and a triple-indirect. ext4 and xfs use extents: (start_block, length) ranges, which are dramatically more compact for large files.
"Block groups are just decoration." They are load-bearing. FFS and ext introduced groups precisely so you could allocate related data (inode and its data blocks, directory entries and their target inodes) near each other to keep HDD seek times manageable.
How To Use It
For any operation, list which structural regions are touched and in what order. For creat("/foo"):
- parent directory's data block (add entry): write
- inode bitmap: write (allocate new inode)
- inode table: write (initialize new inode)
- (possibly) parent directory's inode: write (size update)
This list is also a checklist for crash recovery: any crash between these writes leaves the FS in a partial state.
Check Yourself
- Why does ext4 use extents instead of direct/indirect block pointers? What workload penalizes direct pointers most?
- Which block must be read on every file open, and why is it almost always in cache already?
- If an inode is 256 bytes and blocks are 4 KiB, how many inodes per block, and how is inode number converted to LBA?
Mini Drill or Application
Sketch the full list of on-disk writes for each:
touch /tmp/aecho hello > /tmp/achmod 600 /tmp/arm /tmp/amv /tmp/a /tmp/b
Then for (2), order the writes such that a crash between any two leaves the FS in a state that can be recovered by a consistency checker. (This is a preview of Cluster 3.)