Skip to main content

Blocking, Non-blocking, and select / poll / epoll

What This Concept Is

An I/O operation on an fd can behave in one of three ways:

  • Blocking: read(fd) sleeps the calling thread until data is available. One thread per connection is the natural shape.
  • Non-blocking: read(fd) on an fd with O_NONBLOCK either returns data immediately or returns EAGAIN / EWOULDBLOCK. The caller asks again later.
  • Readiness-based multiplexing: one thread watches many FDs and is told which are ready to be operated on without blocking. The three canonical mechanisms on Unix:
  select(nfds, readfds, writefds, exceptfds, timeout):
- FD_SET-based bitmap, O(nfds) scan every call
- portable, but fd limit is ~1024 and cost grows with total fds

poll(pollfd[], nfds, timeout):
- array of { fd, events, revents } entries, O(nfds) scan
- no fixed fd limit, but still O(n) per call

epoll_create / epoll_ctl(ADD/MOD/DEL) / epoll_wait:
- O(1) per fd registered, O(ready) per wait
- Linux-only; BSD equivalent is kqueue

The operational pattern is the same: register the FDs you care about; call the blocking wait; iterate only the FDs reported ready; do non-blocking I/O on each; return to the wait.

Why It Matters Here

Servers that must handle many thousands of connections cannot afford a thread per connection. Thread stacks, context-switch overhead, and lock contention all scale badly. The historical progression is:

  1. Thread-per-connection (blocking): simple, 1994-era web servers. Caps out around a few thousand connections.
  2. select / poll: one thread for many FDs. Scales into tens of thousands but cost per wait is O(n) in total FDs watched. At 100k FDs, every select call scans 100k entries.
  3. epoll / kqueue: kernel maintains a ready list. A wait returns in O(ready) time. Scales to hundreds of thousands of idle connections cheaply.

This concept is also what Module 5 (networking) builds on. Every high-performance server from nginx to node.js to the Linux kernel's own tests uses epoll.

Concrete Example

A minimal epoll echo server (pseudocode):

int ep = epoll_create1(0);
int lfd = socket(...); bind; listen;
set_nonblocking(lfd);
struct epoll_event ev = { .events = EPOLLIN, .data.fd = lfd };
epoll_ctl(ep, EPOLL_CTL_ADD, lfd, &ev);

struct epoll_event events[64];
for (;;) {
int n = epoll_wait(ep, events, 64, -1);
for (int i = 0; i < n; i++) {
int fd = events[i].data.fd;
if (fd == lfd) {
int c = accept4(lfd, NULL, NULL, SOCK_NONBLOCK);
ev.events = EPOLLIN | EPOLLET;
ev.data.fd = c;
epoll_ctl(ep, EPOLL_CTL_ADD, c, &ev);
} else {
char buf[4096];
while (1) {
ssize_t r = read(fd, buf, sizeof buf);
if (r > 0) write(fd, buf, r);
else if (r == 0) { close(fd); break; } // peer closed
else if (errno == EAGAIN) break; // drained
else { close(fd); break; }
}
}
}
}

Contrast with select: the fd_set must be rebuilt every iteration, and the kernel walks every watched fd on every call. At 50,000 sockets with 10 ready, select scans 50,000 bits and returns 10; epoll walks a ready list of size 10 and returns 10.

Common Confusion / Misconception

"epoll is asynchronous I/O." No. epoll is readiness notification: it tells you an FD is ready for a non-blocking operation. The I/O itself is still synchronous (read, write). True async I/O (where the kernel does the op in background and hands you completion) is aio_* and io_uring (concept 14).

"Level-triggered vs edge-triggered is obscure detail." It is not. In level-triggered mode, epoll_wait reports an FD as ready whenever the condition holds (unread bytes exist). In edge-triggered mode, the kernel reports only transitions; if you don't drain the FD after notification, you won't hear again. Edge-triggered is faster but requires non-blocking FDs and drain loops. Get this wrong and you hang or loop.

"select is deprecated." It is still portable to BSD, macOS, and Windows (via a subset). For small FD counts (<100) it is fine. But for servers, use epoll on Linux and kqueue on BSD/macOS.

How To Use It

To design a readiness-based server:

  1. All FDs that participate: O_NONBLOCK.
  2. Register with epoll (Linux) or kqueue (BSD). Choose level- or edge-triggered.
  3. On each epoll_wait return, iterate ready FDs. For each, do non-blocking I/O in a loop until EAGAIN.
  4. For writes that may block, buffer and register EPOLLOUT. Unregister when the buffer drains.
  5. Handle EPOLLRDHUP (peer closed) and errors cleanly.

The loop structure (while (epoll_wait): dispatch) is the event loop. This is the core shape of every production async server.

Check Yourself

  1. Why does select become O(N) per call where N is the highest fd, not number of ready fds?
  2. What does edge-triggered mode require the application to do, and why?
  3. Why does epoll win only when many FDs are registered but few are ready at a time?

Mini Drill or Application

Implement a minimum epoll echo server that accepts connections and echoes 1 KiB buffers. Load it with 1,000 concurrent idle connections plus 10 actively echoing. Measure CPU usage. Then implement the same with select. Measure again. Explain the difference.

Read This Only If Stuck