File I/O and mmap Workshop
Retrieval Prompts
- Write the five raw I/O syscalls and their signatures from memory.
- Explain, in one sentence each, what
O_APPEND,O_CLOEXEC,O_NONBLOCK, andO_TRUNCdo. - State two reasons
readmight return fewer bytes than requested. - Write the four-argument form of
mmapyou would use to read-map a file, and say what each argument does. - State the difference between
MAP_SHAREDandMAP_PRIVATE, both for reads and for writes.
Compare and Distinguish
read/writevsfread/fwritemmapof a file vsmmapwithMAP_ANONYMOUSlseek(fd, 0, SEEK_END)vsfstat(fd, &st).st_sizemsyncvsfsyncftruncateto grow a file vslseek + write 1 byteto grow it
Common Mistake Check
- Writing a
catthat assumesread(fd, buf, 4096)always returns 4096 until EOF. - Calling
lseekon a pipe and treating theESPIPEerror as "the file is weird." - Calling
open(path, O_CREAT | O_WRONLY)without amode_t, getting random-garbage permissions. mmap-ing a 10 GB file and assuming RSS will be 10 GB.- Modifying a
MAP_PRIVATE | PROT_WRITEmapping and expecting the file on disk to change.
Mini Application: Implement cat
Requirements:
- Use only
open,read,write,close. - Loop both
readandwriteto handle short transfers. - Handle
EINTRby retrying. - With no arguments, copy stdin. With one or more paths, concatenate them to stdout.
- Byte-for-byte match against system
caton at least/etc/hostnameand/usr/share/dict/words.
Mini Application: Implement wc -l
Requirements:
- Use only the raw I/O syscalls.
- Count newlines by inspecting each byte.
- Accept multiple files and print a total, matching the system
wc -lformat. - Handle input from a pipe (
some | ./wcl) -- nolseek.
Mini Application: mmap-based Search
Build a program mgrep PATTERN FILE that:
- Opens
FILEandfstats it. mmaps it read-only.- Walks the mapping, printing every line containing
PATTERN(a literal, not a regex). - Prints the byte offset of each match (so you can prove you are walking the mapping, not re-reading).
Compare its runtime to grep -F PATTERN FILE on a large text file.
Scenarios
- A logging program opens a file with
O_APPENDand forks; both processes callwrite. Interleaving is always clean at the line boundary. Why? - A program
mmaps a 4 GB file on a 4 GB RAM machine and works fine; the same program,read-ing the file into one big buffer, is OOM-killed. Why? - A program modifies a
MAP_SHAREDmapping and callsexit. Another process opens the file and sees the old bytes. What was missing? - A program reads a log file with
read(fd, buf, 1)and is CPU-bound.strace -cshows 99% of time inread. Explain and fix. - A program
mmaps a file, callsftruncate(fd, new_smaller_size), then reads through the pointer past the new size. It crashes. Why?
Evidence Check
Complete when your cat and wc -l match the system versions on three non-trivial inputs, and your mgrep finishes faster than read-based equivalents on a 1 GB file.