Skip to main content

Module Quiz

Complete this quiz after finishing all concept and practice pages.

Current Module Questions

Question 1: Inode vs Name

A file /tmp/a has inode number 5000 and link count 1. You run ln /tmp/a /tmp/b, then rm /tmp/a. Describe the state of inode 5000, its data, /tmp/b, and the link count.

Answer: Inode 5000 and its data blocks are unchanged. /tmp loses the entry "a" but keeps the new entry "b". Link count goes 1 -> 2 -> 1. You can still read the data through /tmp/b.

Question 2: Open-File Table Semantics

Process P opens /tmp/a and gets fd=3. P forks into child C. P writes 10 bytes, then C writes 10 bytes. What does the file look like, and why? Contrast with P opening it twice independently.

Answer: With fork, parent and child share the open-file entry, so the offset is shared. The second write appends after the first: 20 bytes total. With two open calls, each FD has its own offset starting at 0; the second write overwrites the first, leaving 10 bytes.

Question 3: Path Resolution

Trace open("/home/alice/a.txt", O_RDONLY) down to inode reads. Assume cold dentry cache and ext-style FS.

Answer:

  1. Read root inode (inode 2). Read its data block to find "home" -> inode 7.
  2. Read inode 7. Read its data block to find "alice" -> inode 42.
  3. Read inode 42. Read its data block to find "a.txt" -> inode 5000.
  4. Read inode 5000. Allocate an open-file entry with offset 0 and an FD pointing at it.

Four inode reads and up to four data-block reads, possibly merged or fewer if blocks are cached.

Question 4: On-Disk Layout

In an ext-style FS with 4 KiB blocks and 256-byte inodes, how many inodes fit per block? Given inode table start LBA = 3 and inode 5000, compute the LBA that holds inode 5000.

Answer: 4096 / 256 = 16 inodes per block. Inode 5000 sits in block 5000 / 16 = 312, at LBA 3 + 312 = 315, at byte offset (5000 mod 16) * 256 = 8 * 256 = 2048 within that block.

Question 5: FS Archetype Choice

A message broker needs to persist millions of small records per second, mostly sequentially. Which archetype (FAT, ext, LFS) fits best, and why?

Answer: LFS-style (or an LSM-tree built on any FS). Millions of small records become a single sequential log, which is the best case for both HDD and SSD. ext-style can work but requires careful batching to avoid scattered metadata updates. FAT does not scale.

Question 6: Crash Scenario - Append

An append under data=ordered ext4 issues (1) data block write, (2) journal transaction write, (3) journal commit block, (4) checkpoint metadata to homes. For each crash point, describe the state on remount and what recovery does.

Answer:

  • Crash after 1: no transaction in journal; data block sits in unallocated space. Nothing to replay. Consistent.
  • Crash after 2 (before commit): transaction in journal but no commit block. Recovery ignores it. Consistent.
  • Crash after 3 (before checkpoint): committed transaction exists. Recovery replays metadata updates into homes. File now complete and consistent.
  • Crash after 4: transaction in journal is now redundant. Eventually erased. Consistent.

Question 7: Crash Scenario - Naive Append

Repeat question 6 for an FS with no journal, writes in the order (bitmap, inode, data). What is the worst case?

Answer: Crash after bitmap and inode but before data. The bitmap says the block is allocated, the inode says the file points to it, but the block content is old garbage. Reads return garbage labeled as legitimate user data. A consistency checker cannot tell this is wrong because the structure is self-consistent; only the user can tell the data is wrong.

Question 8: Why COW Avoids fsck

In one paragraph, explain why Btrfs or ZFS do not need a traditional fsck and what they need instead.

Answer: In a COW FS, no live block is overwritten; updates write new blocks and flip a root pointer atomically. A crash either leaves the old tree live (pre-update) or the new tree live (post-update), never a mix. There is no partial state to repair, so fsck has no work. What COW still needs is scrub: reading every block and verifying checksums, because hardware can silently corrupt stored bytes (bit rot, misdirected writes). Scrub is about data integrity, not structural repair.

Question 9: Page Cache and Memory

Why does free -h show most memory as "used" on a production server, and why is that fine?

Answer: The kernel uses free memory as page cache aggressively. Cache pages count as "used" in the old free output but are reclaimable under allocation pressure. The "available" column (newer free) reflects what could be given back. Unused RAM is wasted RAM; if the kernel left it idle, you would take a cache miss that a cache hit would have prevented.

Question 10: fsync Semantics

write(fd, buf, 4096) returns successfully. You pull the power cable. Is the data on disk? Justify.

Answer: Not necessarily. write only copies into the page cache; it returns before writeback has issued the block I/O. Even if the kernel has issued the block I/O, the drive may hold the data in a volatile write cache until a FLUSH command lands. Only after fsync(fd) returns successfully (on a system with correct flush propagation) is the data durable. This is why fsync exists.

Question 11: Sequential vs Random Gap

Given an HDD with 100 random IOPS at 4 KiB and 150 MiB/s sequential, compute wall time to read 1 GiB as (a) sequential, (b) 4 KiB random.

Answer:

  • (a) 1 GiB / (150 MiB/s) = 1024 / 150 = ~7 seconds.
  • (b) 1 GiB / 4 KiB = 262,144 requests at 100 IOPS = ~2620 seconds = ~44 minutes.

The gap is ~400x on HDD. This is why file system design obsesses about sequentiality.

Question 12: epoll vs select Scaling

Why does select scale poorly as the number of watched FDs grows, even when few are ready? Why does epoll avoid this?

Answer: select passes a full bitmap of watched FDs to the kernel on every call; the kernel scans every bit and rebuilds the ready set. Cost is O(max_fd) per call. epoll registers FDs once with epoll_ctl; the kernel maintains a ready list. epoll_wait returns items from that list, costing O(ready). With 100,000 FDs and 10 ready, select scans 100k bits; epoll returns 10.

Question 13: Level vs Edge Triggered

In epoll, what is the difference between level-triggered and edge-triggered modes, and what does the application have to do differently under edge-triggered?

Answer: Level-triggered reports "ready" whenever the condition holds (e.g., unread bytes). Edge-triggered reports only on the transition (bytes became available). Under edge-triggered, the application must drain the FD fully in a loop until EAGAIN; otherwise the kernel will not notify again until the next transition. Edge-triggered is faster (fewer notifications) but has sharper failure modes.

Question 14: io_uring vs epoll

Why can io_uring outperform epoll + non-blocking I/O on NVMe-saturating workloads, even though both avoid per-FD polling?

Answer: epoll is readiness-based: the app still issues synchronous I/O per ready FD, one syscall per op. io_uring is completion-based: submissions and completions are shared-memory rings; with SQ polling mode the syscall disappears from the hot path entirely. Batched submissions amortize any remaining kernel-boundary cost. On NVMe devices capable of millions of IOPS, the syscall cost of epoll becomes the bottleneck.

Question 15: DMA and Zero Copy

read(fd, buf, N) serves from the page cache. Describe the copies involved. How does sendfile(out, in, ...) or splice reduce them?

Answer: Buffered read involves: DMA from device into kernel page cache (on cache miss), then memcpy from kernel cache to user buffer. sendfile / splice lets the kernel move data from one file descriptor (file) to another (socket) without copying through user space; on many networking drivers the final hop to the NIC is DMA. This is called "zero copy" because no CPU-driven memcpy happens for the user data.

Interleaved Review Questions

Prior Module Question 1 (S5 Module 1: Processes & Scheduling)

What is a process's PCB, and why does every context switch touch it?

Answer: The Process Control Block holds the kernel's view of a process: registers, memory map, open files, scheduling state, parent/child pointers, signal masks. A context switch saves the outgoing process's register state into its PCB and loads the incoming process's register state from its PCB. Memory map and FD table pointers are swapped too.

Prior Module Question 2 (S5 Module 2: Memory Management)

How does demand paging interact with the page cache? Is a memory-mapped file in the page cache?

Answer: A memory-mapped file's pages are page-cache pages. mmap maps the file's page cache into the user's virtual address space. A page fault on an unmapped offset triggers a read into the page cache (if not already there) and a PTE installation. Writes dirty the cache page, which is flushed later by the usual write-back mechanism.

Prior Module Question 3 (S5 Module 3: Concurrency)

Why is write not thread-safe for arbitrary offsets in all cases? What primitive would you reach for?

Answer: Multiple threads writing to the same FD via write share the FD's offset; interleaved writes may produce interleaved bytes. For per-offset writes, use pwrite, which takes an explicit offset and does not modify the shared offset. For larger atomicity, use locking at the application layer.

Prior Module Question 4 (S4 Systems Programming)

What is a syscall, and roughly what does it cost on modern Linux?

Answer: A syscall is a trap from user mode to kernel mode to invoke a kernel service. Cost today is ~100-500 ns per syscall on modern x86 with mitigations, up from ~10-50 ns pre-Meltdown. This is why APIs that avoid syscalls in the fast path (vDSO, io_uring with SQ polling) can be dramatically faster.

Prior Module Question 5 (S2 Module 5: Advanced Data Structures)

An LSM tree is popular for write-heavy workloads. How does it interact with the file system layer?

Answer: LSM writes to an in-memory memtable; when full, it is flushed as a sorted file (SSTable) via large sequential writes. Background compaction merges SSTables, again sequentially. This pattern rides the sequential-vs-random gap (concept 12) and benefits from log-structured or journaled FS behavior. It also leans on fsync for durability of the write-ahead log.

Self-Assessment and Remediation

Mastery Level (90-100% correct):

  • Ready to advance to Module 5 (Network Protocols & Sockets) with confidence.

Proficient Level (75-89% correct):

  • Review only the missed concept pages and redo one drill of each missed type.

Developing Level (60-74% correct):

  • Rework Practice 1 (internals), Practice 2 (crash consistency), and the FUSE + journaling katas. Expect the crash scenarios and layout tracing to be the main gap.

Insufficient Level (<60% correct):

  • Return to the concept sequence. Most likely issue: you are reading the material but not drawing the data structures. Restart with Cluster 1 and work paper-and-pencil traces.