Blocking, Non-blocking, and select / poll / epoll
What This Concept Is
An I/O operation on an fd can behave in one of three ways:
- Blocking:
read(fd)sleeps the calling thread until data is available. One thread per connection is the natural shape. - Non-blocking:
read(fd)on anfdwithO_NONBLOCKeither returns data immediately or returnsEAGAIN/EWOULDBLOCK. The caller asks again later. - Readiness-based multiplexing: one thread watches many FDs and is told which are ready to be operated on without blocking. The three canonical mechanisms on Unix:
select(nfds, readfds, writefds, exceptfds, timeout):
- FD_SET-based bitmap, O(nfds) scan every call
- portable, but fd limit is ~1024 and cost grows with total fds
poll(pollfd[], nfds, timeout):
- array of { fd, events, revents } entries, O(nfds) scan
- no fixed fd limit, but still O(n) per call
epoll_create / epoll_ctl(ADD/MOD/DEL) / epoll_wait:
- O(1) per fd registered, O(ready) per wait
- Linux-only; BSD equivalent is kqueue
The operational pattern is the same: register the FDs you care about; call the blocking wait; iterate only the FDs reported ready; do non-blocking I/O on each; return to the wait.
Why It Matters Here
Servers that must handle many thousands of connections cannot afford a thread per connection. Thread stacks, context-switch overhead, and lock contention all scale badly. The historical progression is:
- Thread-per-connection (blocking): simple, 1994-era web servers. Caps out around a few thousand connections.
- select / poll: one thread for many FDs. Scales into tens of thousands but cost per wait is
O(n)in total FDs watched. At 100k FDs, everyselectcall scans 100k entries. - epoll / kqueue: kernel maintains a ready list. A wait returns in
O(ready)time. Scales to hundreds of thousands of idle connections cheaply.
This concept is also what Module 5 (networking) builds on. Every high-performance server from nginx to node.js to the Linux kernel's own tests uses epoll.
Concrete Example
A minimal epoll echo server (pseudocode):
int ep = epoll_create1(0);
int lfd = socket(...); bind; listen;
set_nonblocking(lfd);
struct epoll_event ev = { .events = EPOLLIN, .data.fd = lfd };
epoll_ctl(ep, EPOLL_CTL_ADD, lfd, &ev);
struct epoll_event events[64];
for (;;) {
int n = epoll_wait(ep, events, 64, -1);
for (int i = 0; i < n; i++) {
int fd = events[i].data.fd;
if (fd == lfd) {
int c = accept4(lfd, NULL, NULL, SOCK_NONBLOCK);
ev.events = EPOLLIN | EPOLLET;
ev.data.fd = c;
epoll_ctl(ep, EPOLL_CTL_ADD, c, &ev);
} else {
char buf[4096];
while (1) {
ssize_t r = read(fd, buf, sizeof buf);
if (r > 0) write(fd, buf, r);
else if (r == 0) { close(fd); break; } // peer closed
else if (errno == EAGAIN) break; // drained
else { close(fd); break; }
}
}
}
}
Contrast with select: the fd_set must be rebuilt every iteration, and the kernel walks every watched fd on every call. At 50,000 sockets with 10 ready, select scans 50,000 bits and returns 10; epoll walks a ready list of size 10 and returns 10.
Common Confusion / Misconception
"epoll is asynchronous I/O." No. epoll is readiness notification: it tells you an FD is ready for a non-blocking operation. The I/O itself is still synchronous (read, write). True async I/O (where the kernel does the op in background and hands you completion) is aio_* and io_uring (concept 14).
"Level-triggered vs edge-triggered is obscure detail." It is not. In level-triggered mode, epoll_wait reports an FD as ready whenever the condition holds (unread bytes exist). In edge-triggered mode, the kernel reports only transitions; if you don't drain the FD after notification, you won't hear again. Edge-triggered is faster but requires non-blocking FDs and drain loops. Get this wrong and you hang or loop.
"select is deprecated." It is still portable to BSD, macOS, and Windows (via a subset). For small FD counts (<100) it is fine. But for servers, use epoll on Linux and kqueue on BSD/macOS.
How To Use It
To design a readiness-based server:
- All FDs that participate:
O_NONBLOCK. - Register with
epoll(Linux) orkqueue(BSD). Choose level- or edge-triggered. - On each
epoll_waitreturn, iterate ready FDs. For each, do non-blocking I/O in a loop untilEAGAIN. - For writes that may block, buffer and register
EPOLLOUT. Unregister when the buffer drains. - Handle
EPOLLRDHUP(peer closed) and errors cleanly.
The loop structure (while (epoll_wait): dispatch) is the event loop. This is the core shape of every production async server.
Check Yourself
- Why does
selectbecomeO(N)per call whereNis the highest fd, not number of ready fds? - What does edge-triggered mode require the application to do, and why?
- Why does
epollwin only when many FDs are registered but few are ready at a time?
Mini Drill or Application
Implement a minimum epoll echo server that accepts connections and echoes 1 KiB buffers. Load it with 1,000 concurrent idle connections plus 10 actively echoing. Measure CPU usage. Then implement the same with select. Measure again. Explain the difference.