Skip to main content

What a File Is: Byte Stream, Metadata, Inodes

What This Concept Is

A file is three things joined by convention, not one thing:

  • a byte stream: a linear sequence of bytes indexed from offset 0
  • a metadata record: size, timestamps, owner, permission bits, link count, block pointers
  • a name: an entry in a directory that resolves to the metadata record

On Unix-style file systems, the metadata record is called an inode (index node). Inodes have numbers, not names. Names live in directories. The thing that "is" the file is the inode; the name is just one way to reach it.

   "/etc/hosts"  name, directory entry in /etc
|
v
inode #12345 (metadata)
- size: 412 bytes
- mode: 0644, uid 0, gid 0
- mtime, atime, ctime
- link count: 1
- pointers to data blocks -> [block 88201][block 88202]...

The split matters: you can have two names for one inode (hard link), or one name for an inode with no data blocks (zero-length file), or an inode with zero names (open but unlinked).

Why It Matters Here

Every later concept in this module relies on this split. Crash consistency is hard because inode, bitmap, and data blocks live in different places and can be updated out of order. The page cache keys off inode, not name. Journaling logs changes to on-disk structures, many of which are inodes. unlink does not delete a file; it decrements a link count in the inode.

If you confuse name and file, you will misread every later trace.

Concrete Example

echo hello > /tmp/a creates:

  • a new inode, say #5000, with size=6, pointer to one data block #9001 holding "hello\n"
  • a directory entry in /tmp mapping "a" -> 5000

ln /tmp/a /tmp/b does not copy data. It adds a second directory entry "b" -> 5000 and increments the inode's link count from 1 to 2. Both names reach the same bytes; editing via either name edits the same block.

rm /tmp/a deletes the name "a" from /tmp and decrements the link count to 1. The inode, data block, and /tmp/b are all untouched. Only when the link count hits zero (and no process has it open) does the FS free the inode and data blocks.

Common Confusion / Misconception

"The file name is part of the file." It is not. The name is a directory entry. Renaming a file does not touch the inode; it rewrites a directory.

"rm erases data." It does not. It detaches the name. The data may linger in free blocks until overwritten. Secure deletion requires explicit overwrites or BLKDISCARD.

"Two files with the same name are two files." Two files can share a name across different directories, or share an inode via hard links. The identity is the (device, inode) pair.

How To Use It

When reasoning about any file operation, answer three questions:

  1. Which directory is changing? (name added, removed, renamed)
  2. Which inode is changing? (metadata: size, times, mode, link count)
  3. Which data blocks are changing? (the byte stream itself)

Every syscall touches some subset. write touches data blocks and inode. chmod touches only the inode. rename touches two directories and neither inode.

Check Yourself

  1. Why does ls -li show the inode number in the first column? What does that number tell you about two entries?
  2. If a process has /tmp/a open and another process runs rm /tmp/a, can the first process still read the file? Why?
  3. What is the difference between a hard link and a symbolic link at the inode level?

Mini Drill or Application

Run and explain the output:

touch /tmp/a
ln /tmp/a /tmp/b
ls -li /tmp/a /tmp/b
stat /tmp/a
rm /tmp/a
ls -li /tmp/b
stat /tmp/b

Write one paragraph: which inode number does each ls row show, what does the link count show, and what happened to the data.

Read This Only If Stuck