Skip to main content

Directories Are Just Files With Structured Contents

What This Concept Is

A directory is a file whose contents are a table of (name, inode_number) entries. The OS marks it with a distinct file type so that user code cannot write to it with write, but from the file system's point of view, a directory is just an inode whose data blocks hold name records.

  directory /home/alice  (inode #42, type = DIR)
data block holds:
+------+-----+-------+----------+
| ino | rec | type | name |
+------+-----+-------+----------+
| 42 | ... | DIR | . | <- self
| 7 | ... | DIR | .. | <- parent
| 5000 | ... | REG | a.txt |
| 5001 | ... | REG | b.txt |
| 6000 | ... | DIR | projects |
+------+-----+-------+----------+

This is why a file system forms a tree (or DAG with hard links): every directory contains the inode numbers of its children. Path resolution is a sequence of lookups in these tables, one per path component.

Why It Matters Here

Every syscall that takes a pathname starts with path resolution: walk the path, reading one directory at a time, using the last inode as the base for the next name. open("/a/b/c") reads the inode for /, looks up "a" in its data to get inode X, reads X, looks up "b", and so on.

Directory structure dictates:

  • permissions: you need execute (x) on every directory along the path
  • linking: ln a b rewrites the directory, not the inode of a
  • renaming: rename("a/x", "b/y") rewrites two directories atomically
  • lookup cost: a directory with 100k entries that uses a linear list is slow; ext4 uses htrees (hashed B-trees) for large directories

Concrete Example

Path resolution for /home/alice/a.txt:

  1. Start at the root inode (inode 2 on ext*). Read its data block.
  2. Look up "home". Find inode 7. Read inode 7's data block.
  3. Look up "alice". Find inode 42. Read inode 42's data block.
  4. Look up "a.txt". Find inode 5000. Stop; return inode 5000 and its open-file state.

That is four inode reads and typically four data-block reads. The kernel's dentry cache (directory entry cache) short-circuits this: subsequent open("/home/alice/a.txt") calls hit the dcache and skip the disk entirely.

Writing new names is equally mechanical. touch /home/alice/b.txt:

  1. Allocate a new inode from the inode bitmap.
  2. Write the new inode (size 0, timestamps, permissions).
  3. Add a directory entry "b.txt" -> 5001 to inode 42's data block, possibly allocating another data block for the directory if it overflowed.

All three on-disk structures change. This is already a hint that crash consistency is hard (Cluster 3).

Common Confusion / Misconception

"cd changes state on disk." It does not. chdir updates only the process's current working directory pointer, an in-kernel per-process value.

"A symbolic link is a kind of hard link." No. A symbolic link is a file whose content is a path string. A hard link is a second directory entry pointing to the same inode. Symbolic links can cross file systems and point at non-existent targets; hard links cannot.

"Large directories are always fast." On classical FFS or ext2 they are linear scans, so opening a directory with 1M entries and reading one file takes O(1M) time. Modern FS (ext4, xfs) use hashed or B-tree directories.

How To Use It

When reasoning about any path-taking syscall, expand it into component lookups. Ask: which directories must I read? Which must I write? Answer for unlink("/a/b"): read /, read /a; write /a (remove entry), decrement inode for /a/b's target.

Check Yourself

  1. Why does mv a.txt /tmp/a.txt across file systems actually copy bytes, while mv within a file system does not?
  2. What permission do you need to list files in /etc? To stat /etc/hosts? Explain the difference.
  3. Why is it impossible to hard-link a directory on Linux?

Mini Drill or Application

Predict and then verify with stat and ls -lid:

mkdir /tmp/x
touch /tmp/x/a
ln /tmp/x/a /tmp/x/b
ls -li /tmp/x
mv /tmp/x/a /tmp/y_a
ls -li /tmp /tmp/x

Explain which inodes changed, which directory entries changed, and which data blocks were untouched.

Read This Only If Stuck