Skip to main content

The Declarative Reconciliation Loop

What This Concept Is

Kubernetes is not an imperative system that runs your commands. It is a set of controllers, each running a reconciliation loop that follows the same three steps:

  • Observe. Watch the current state of the world through the api-server.
  • Diff. Compute the difference between the desired state (spec) and the actual state (status).
  • Act. Take one small step to close the gap, then go back to observing.

Every resource has this split:

  • metadata -- who and what it is (name, namespace, labels, owner references)
  • spec -- what the user wants
  • status -- what the controller observes and reports

The user writes only metadata and spec. The controller writes only status. The api-server is the authoritative record of both.

Why It Matters Here

This loop is the single mental model that unifies everything: Deployments, ReplicaSets, Services, Ingress, HPA, Node lifecycle, all operators you will ever write. If you think of kubectl apply as "run this instruction," you will be confused when:

  • a pod you deleted comes back (a controller observed the gap and acted)
  • a change you made via kubectl edit gets reverted by another controller (two loops are fighting over the same field)
  • your Deployment has been updated but no rollout happened (the controller observed that spec did not meaningfully change, so the diff was empty)
  • a rollout stalls forever (the controller cannot close the gap because pods are not becoming Ready)

Concrete Example

The ReplicaSet controller's loop, in pseudocode:

loop forever:
rs := get ReplicaSet
pods := list Pods where ownerRef.uid == rs.uid
live := count(pods where phase in {Pending, Running})

if live < rs.spec.replicas:
create (rs.spec.replicas - live) Pods from rs.spec.template
if live > rs.spec.replicas:
delete (live - rs.spec.replicas) Pods
update rs.status to match observed

That is the entire logic. It is idempotent: running it twice in a row produces the same result as running it once. That is why Kubernetes survives controller restarts, network partitions, and concurrent edits.

Concretely, when you run:

kubectl scale deployment web --replicas=5

the api-server updates deployment.spec.replicas = 5. The Deployment controller observes, updates the current ReplicaSet's spec.replicas = 5. The ReplicaSet controller observes it has 3 Pods but wants 5, creates two more. The scheduler observes unscheduled Pods, assigns nodes. The kubelet observes bound Pods, starts containers. No one "executed" the scaling command; five controllers closed a chain of small gaps.

Common Confusion / Misconception

"Declarative means I describe what I want and Kubernetes works out the steps."

Half true. Declarative means the resource file names the desired state. But Kubernetes does not plan a sequence; it loops, closing one gap at a time. There is no global plan, no ordering guarantee between different loops. This is why eventual consistency is fundamental to Kubernetes: you cannot assume "if I applied A and B in order, A happened first everywhere."

A second confusion: "spec and status are both the desired state." They are not. spec is desired. status is observed. If you write to status, a controller will overwrite you.

A third confusion: "If I delete a pod created by a Deployment, I break the Deployment." You do not. The controller observes the gap and creates a replacement. That is the entire point.

How To Use It

For every new resource kind, ask three questions:

  1. Who is the controller?
  2. What does spec let me set, and what does status tell me?
  3. What is the smallest step that controller takes per loop?

When debugging, the right mental question is "what does the controller see that makes it believe the gap is already closed?" Often the answer is a stale status, a selector mismatch, or a second controller writing to the same field.

Check Yourself

  1. What is the difference between spec and status?
  2. Why does kubectl apply not need a sequence of steps?
  3. If you write to status.replicas, what happens, and why?
  4. Name one symptom that points at two controllers owning the same field.
  5. Why is idempotency of the reconciler essential for cluster self-healing?

Two Controllers Fighting Over the Same Field

A subtle, common production bug: two controllers write to the same field and the field appears to "flap."

Examples:

  • HPA scales a Deployment to 6 replicas; a GitOps tool reconciles the Deployment back to replicas: 3 because that is what the Git manifest says; the HPA observes usage again and scales back up. This loop churns until someone adds spec.replicas to the GitOps ignore list or uses scaleTargetRef on the HPA plus a manifest with no replicas field.
  • A webhook mutates every Pod to add a sidecar; a ReplicaSet controller observes that the rendered Pod does not match template and continually recreates Pods. Fix: the mutation must also update the ReplicaSet's spec.template or be invisible to the selector.

The rule: every field must have exactly one owner. When a bug looks like "it keeps changing back," ask "which two controllers think they own this?"

Mini Drill or Application

Create a Deployment with replicas: 2. Run kubectl delete pod on one of its Pods. Watch kubectl get pods -w. Then scale to replicas: 0. Watch again. For each event you see, label whose loop produced it (Deployment, ReplicaSet, scheduler, kubelet). Write a paragraph: "The resource did not move; the loops did."

Now take a resource whose controller you have not met before: a CronJob. Apply one with a schedule like */1 * * * *. Watch the Jobs it creates. Diagram its reconcile loop in observe -> diff -> act terms. Notice that the controller is also idempotent: if you run the controller twice in a minute you still get at most one Job for that minute.

Consequences of the Loop

Things that are true because every controller is a loop, not a script:

  • Ordering is not guaranteed across resources. A Deployment and its Service may become Ready in either order; client code must tolerate connection refused during startup.
  • Commands are retried. If the api-server blinks during an apply, the controller tries again.
  • There is no rollback unless a controller implements one. Deployments track ReplicaSet history and implement kubectl rollout undo. Generic resources do not.
  • Drift detection is built in. A user who kubectl edits a Deployment's ReplicaSet template will see the Deployment controller overwrite them on the next loop.
  • Operators extend the system without changing its shape. A Custom Resource Definition plus a controller that watches it gives you a new resource type that behaves exactly like Deployments or Services from a user's point of view.

Watch, List, and the Cache

Controllers do not poll the api-server. They open a watch -- a long-lived HTTP stream -- and receive delta events (added, modified, deleted). Between restarts, a controller does a full list to rebuild its local cache, then switches to watch with the resourceVersion returned by the list to avoid missing events.

This is why:

  • Controllers scale: ten thousand Pods do not become ten thousand per-loop HTTP calls.
  • Events can be missed if the watch lags; controllers must re-list periodically and be idempotent enough that a re-list is safe.
  • kubectl get -w uses the same mechanism; it is a thin CLI wrapper around watch.

The client-go informer framework (used by almost every controller in Kubernetes) provides a local indexed cache, a work queue, and retry with exponential backoff. When you write an operator, you are filling in one function -- the reconcile step -- and the framework handles the rest.

Server-Side Apply and Field Ownership

kubectl apply used to be a client-side diff: the client computed what to send. Modern clusters default to server-side apply, where the api-server tracks per-field ownership via the managedFields metadata.

  • Each field you set is stamped with a manager (e.g. kubectl, kube-controller-manager, your operator).
  • Another manager cannot overwrite a field you own without --force-conflicts.
  • If you stop setting a field you previously owned, the api-server drops it from your ownership and falls back to whatever default or other-manager value exists.

This makes "why did my edit stick until a controller reverted it" tractable: read the object's managedFields and see who owns the field you set. Server-side apply is how the reconciliation loop extends from "who runs the loop" to "who is authoritative per field."

Transfer: You Have Seen This Pattern Before

The reconcile loop is the same structural idea as several things you already know:

  • Git's index vs working tree vs HEAD. spec is "what you want," status is "what is currently committed," and the diff drives what happens next. git status and kubectl describe are almost the same question.
  • Terraform plan/apply. A declarative description is compared to observed state; the delta is executed. Kubernetes differs in that there is no apply step the user invokes -- a controller closes the gap continuously.
  • Self-healing supervisors (systemd, Erlang/OTP). A supervisor watches children and restarts them to a desired set. Kubernetes generalizes this across the whole cluster and exposes the desired set as API objects.
  • CRDT-like convergence. Idempotent, commutative reconciliation means the order of controller runs does not change the final state, as long as each step moves monotonically toward spec.

If you hit a controller you have never met, your question is always the same: what is its spec, what does it observe, and what is the single smallest step it takes per loop?

Read This Only If Stuck