Skip to main content

The TCP Handshake and State Machine

What This Concept Is

A TCP connection is a state machine running on both endpoints. The two states you usually hear about -- ESTABLISHED and CLOSED -- are just two of eleven.

The most important transitions are:

  • Three-way handshake (open): SYN -> SYN-ACK -> ACK.
  • Four-way close: each side sends its own FIN and gets it acknowledged. The closer enters TIME_WAIT for roughly 2 * MSL before truly releasing the 4-tuple.
  • Reset: a RST ends the connection abruptly with no graceful close.

The handshake is not just ceremony. It synchronizes initial sequence numbers (so retransmission and reordering work), exchanges window sizes, and confirms that both sides are willing to talk to each other.

Why It Matters Here

The state machine is where the most common production bugs live:

  • SYN with no SYN-ACK -> firewall silently dropping.
  • SYN-ACK with no final ACK from client -> path-MTU or NAT issue.
  • Thousands of sockets stuck in TIME_WAIT -> you are creating too many short-lived outgoing connections.
  • Sockets stuck in CLOSE_WAIT -> your application forgot to close() after the peer sent FIN.

If you cannot name the state, you cannot fix the bug.

Concrete Example

Client                                                  Server
state=CLOSED state=LISTEN
--- SYN seq=1000 ------------------------------->
state=SYN_SENT state=SYN_RCVD
<-- SYN seq=4000, ACK=1001 ---------------------
state=ESTABLISHED state=ESTABLISHED
--- ACK=4001 ------------------------------------>

... application data flows in both directions ...

--- FIN seq=1500, ACK=4500 --------------------->
state=FIN_WAIT_1 state=CLOSE_WAIT
<-- ACK=1501 -----------------------------------
state=FIN_WAIT_2
(server finishes its work)
<-- FIN seq=4500, ACK=1501 ----------------------
state=LAST_ACK
--- ACK=4501 ----------------------------------->
state=TIME_WAIT (waits 2*MSL) state=CLOSED
state=CLOSED

TIME_WAIT exists so that late-arriving segments from the just-closed connection cannot be misinterpreted by a new connection that happens to reuse the same 4-tuple.

Common Confusion / Misconception

"CLOSE_WAIT and TIME_WAIT are basically the same." They are opposites.

  • CLOSE_WAIT: the peer sent FIN, and you have not called close() yet. This is an application bug 99% of the time.
  • TIME_WAIT: you initiated the close and are waiting for any stray retransmissions to drain. This is normal.

Another trap: RST is not an error signal for the application. It is the abrupt equivalent of slamming the receiver down. Applications see it as ECONNRESET.

How To Use It

For any stuck or odd-looking connection:

  1. Find the sockets: ss -tan | grep :443.
  2. Read the state column.
  3. Map it against the state machine: which side initiated the close? what is being waited for?
  4. Correlate with tcpdump to see the actual segments that did or did not arrive.

Check Yourself

  1. Why must the SYN and SYN-ACK each consume one sequence number even though they carry no payload?
  2. Why does the initiator of a close enter TIME_WAIT and not the responder?
  3. What does a RST in the middle of an ESTABLISHED connection usually mean?

Mini Drill or Application

  1. Run nc -l 9000 in one terminal and nc localhost 9000 in another.
  2. In a third terminal, run sudo tcpdump -n -S -i lo port 9000.
  3. Type one line into the client, press Enter, then Ctrl-D the client.
  4. From the capture, identify the three handshake packets, the data segment, and the four close segments. Label them against the state-machine diagram above.

Read This Only If Stuck