Directories Are Just Files With Structured Contents
What This Concept Is
A directory is a file whose contents are a table of (name, inode_number) entries. The OS marks it with a distinct file type so that user code cannot write to it with write, but from the file system's point of view, a directory is just an inode whose data blocks hold name records.
directory /home/alice (inode #42, type = DIR)
data block holds:
+------+-----+-------+----------+
| ino | rec | type | name |
+------+-----+-------+----------+
| 42 | ... | DIR | . | <- self
| 7 | ... | DIR | .. | <- parent
| 5000 | ... | REG | a.txt |
| 5001 | ... | REG | b.txt |
| 6000 | ... | DIR | projects |
+------+-----+-------+----------+
This is why a file system forms a tree (or DAG with hard links): every directory contains the inode numbers of its children. Path resolution is a sequence of lookups in these tables, one per path component.
Why It Matters Here
Every syscall that takes a pathname starts with path resolution: walk the path, reading one directory at a time, using the last inode as the base for the next name. open("/a/b/c") reads the inode for /, looks up "a" in its data to get inode X, reads X, looks up "b", and so on.
Directory structure dictates:
- permissions: you need execute (
x) on every directory along the path - linking:
ln a brewrites the directory, not the inode ofa - renaming:
rename("a/x", "b/y")rewrites two directories atomically - lookup cost: a directory with 100k entries that uses a linear list is slow; ext4 uses htrees (hashed B-trees) for large directories
Concrete Example
Path resolution for /home/alice/a.txt:
- Start at the root inode (inode
2on ext*). Read its data block. - Look up
"home". Find inode7. Read inode7's data block. - Look up
"alice". Find inode42. Read inode42's data block. - Look up
"a.txt". Find inode5000. Stop; return inode5000and its open-file state.
That is four inode reads and typically four data-block reads. The kernel's dentry cache (directory entry cache) short-circuits this: subsequent open("/home/alice/a.txt") calls hit the dcache and skip the disk entirely.
Writing new names is equally mechanical. touch /home/alice/b.txt:
- Allocate a new inode from the inode bitmap.
- Write the new inode (size 0, timestamps, permissions).
- Add a directory entry
"b.txt" -> 5001to inode42's data block, possibly allocating another data block for the directory if it overflowed.
All three on-disk structures change. This is already a hint that crash consistency is hard (Cluster 3).
Common Confusion / Misconception
"cd changes state on disk." It does not. chdir updates only the process's current working directory pointer, an in-kernel per-process value.
"A symbolic link is a kind of hard link." No. A symbolic link is a file whose content is a path string. A hard link is a second directory entry pointing to the same inode. Symbolic links can cross file systems and point at non-existent targets; hard links cannot.
"Large directories are always fast." On classical FFS or ext2 they are linear scans, so opening a directory with 1M entries and reading one file takes O(1M) time. Modern FS (ext4, xfs) use hashed or B-tree directories.
How To Use It
When reasoning about any path-taking syscall, expand it into component lookups. Ask: which directories must I read? Which must I write? Answer for unlink("/a/b"): read /, read /a; write /a (remove entry), decrement inode for /a/b's target.
Check Yourself
- Why does
mv a.txt /tmp/a.txtacross file systems actually copy bytes, whilemvwithin a file system does not? - What permission do you need to list files in
/etc? To stat/etc/hosts? Explain the difference. - Why is it impossible to hard-link a directory on Linux?
Mini Drill or Application
Predict and then verify with stat and ls -lid:
mkdir /tmp/x
touch /tmp/x/a
ln /tmp/x/a /tmp/x/b
ls -li /tmp/x
mv /tmp/x/a /tmp/y_a
ls -li /tmp /tmp/x
Explain which inodes changed, which directory entries changed, and which data blocks were untouched.