Feature Flags and Dark Launches

What This Concept Is

A feature flag (or toggle) is a runtime switch that decides whether a code path is active. The flag is checked in code; the answer comes from configuration (env var, config file, or a feature-flag service).

if (flags.isEnabled("new-checkout-v2", { user })) {
  return renderNewCheckout(user);
}
return renderLegacyCheckout(user);

A dark launch is a specific use: deploy new code to production but leave it off, or route real production traffic into it without affecting user-visible behavior. You observe it under real load before exposing it.

Flags decouple deploy (code reaches production) from release (users see behavior change). That decoupling is the whole point. Pete Hodgson's Feature Toggles article gives the best taxonomy and is worth reading once; the concept has not meaningfully changed since.

Why It Matters Here

Feature flags are what make trunk-based development safe at scale:

work-in-progress can merge to trunk behind an off flag, so branches stay short
the release is just a flag flip -- no redeploy, no rollback of artifacts
rollback of a feature is a flag flip too -- sub-second
gradual rollout (1% -> 5% -> 100%) is a flag configuration, not a deploy strategy
experimentation (A/B tests) uses the same substrate

They also solve specific operational problems:

kill switches (ops toggles) -- quickly turn off an expensive or broken feature under incident pressure
permission toggles -- long-lived flags gating access by user tier or role
release toggles -- short-lived, removed once the feature is fully rolled out
experiment toggles -- randomized per-user to measure A/B behavior

Different categories have different lifespans, ownership models, and audit requirements. Mixing them in one unlabeled pile is the root cause of most "we have 300 flags and nobody knows what's safe to remove" stories.

Concrete Example

Deploy-then-release flow:

Monday: merge the new checkout code behind flag new-checkout-v2, default off. Deploy.
Tuesday: enable at 1% of internal users. Watch error rate, latency, conversion.
Thursday: ramp to 5% of all users. No regressions.
Friday: 25%, 50%, 100%. Users see the new checkout.
Following sprint: remove the flag and the legacy code. Ticket in backlog to clean up.

If at any point a problem appears, the rollback is:

flags:
  new-checkout-v2:
    enabled: false

30 seconds from detection to mitigation.

The Dark Launch Pattern

A dark launch tests a code path under real load without user-visible effects. Example: a new rewritten search service.

Deploy the new service.
Modify the request handler to also call the new service, discard the result, and log a comparison with the old one.
Observe performance, error rate, and correctness diffs under production load.
Once metrics match, flip the flag to use the new service's result.

This separates "does it run" (deploy) from "does it work" (correctness) from "can users see it" (release). Each step is independently verifiable. Facebook documented this pattern publicly for their chat rewrite in 2008 -- the term "dark launch" traces back to that post. GitHub's "scientist" library formalized the A/B comparison in code.

Flag Service Architecture -- Why Env Vars Are Not Enough

A real flag service has three non-obvious properties that distinguish it from "we check an env var":

Low-latency local evaluation. The flag decision must be made in-process, on cached rules, in microseconds. A per-request remote lookup is a latency and availability regression.
Streaming updates. When a flag changes, every running instance learns within seconds -- via server-sent events, a websocket, or short-poll. Instances started yesterday must still see today's flag changes.
Auditable changes. Who flipped which flag, when, for whom. Flag changes are production changes; they deserve the same trail (concept 14) as deploys.

OpenFeature is the vendor-neutral SDK standard that lets you swap providers (LaunchDarkly, Unleash, Flipt, ConfigCat, self-hosted) without code changes.

Common Confusion / Misconception

"Flags are just if-statements, we already do that." The mechanism is simple; the discipline is not. Flag debt accumulates fast. A production codebase with 300 undocumented old flags has the same maintenance cost as 300 dead config knobs -- massive combinatorial test surface, no one knows which are safe to remove.

"Every feature needs a flag." No. Flags cost complexity: dual code paths, test combinations, runtime lookups. Use them for risky, large, or long-rollout changes. A typo fix does not need a flag.

"Remove the flag right after 100% rollout." Do it, but do it in the next sprint, and track it. The hardest part of flags is getting teams to clean up. A useful rule: every new flag ships with a planned removal date and owner.

"A flag service is overkill -- we'll use env vars." Env vars mean a redeploy to change a flag, which defeats the "decouple deploy from release" property. For short-lived release flags and experimentation, a real flag service or config file that can be updated at runtime is essential.

"Flags replace canary deployments." They complement them. Canary controls which traffic reaches the new deploy; flags control which code paths inside the deploy are active. Most mature teams use both.

"Per-user flags are expensive and rare." They are the default for anything user-facing. A flag evaluated per request with a cached rule set is a hash-and-compare -- cheaper than most serialization your handler already does.

How To Use It

Treat flags as first-class code:

Categorize each flag: release, experiment, ops/kill-switch, permission. Different categories have different lifespans and owners.
Every release flag has an expected removal date and a ticket to remove it.
Audit flag inventory quarterly. Old flags default to "on, safe to remove."
Test both states in CI -- at minimum, all-on and all-off suites. For high-risk combos, explicit pairwise tests.
Keep the lookup fast and cached; a flag evaluated per-request must not be a remote call per request.
Treat flag flips as deployment events: emit an observability marker (concept 15) with flag name, change, actor, timestamp -- so an incident investigation can find "what changed" in seconds.

Check Yourself

What is the difference between a release toggle and an ops (kill-switch) toggle?
Why does "use env vars for flags" break one of the main benefits of flags?
What maintenance problem do feature flags create, and what practice prevents it?
Explain dark launch in one sentence. How does it differ from a canary?
What three properties distinguish a flag service from a config file?

Mini Drill or Application

Inventory the flags in a codebase you work on (grep for your flag API). For each flag:

category (release, experiment, ops, permission)
current default
date introduced
owner (a human name, not a team)
removal plan

Most teams find 20-40% of their flags are abandoned. Those are the ones to remove first.

Source Backbone

CI/CD behavior must be checked against official tool docs, but these books provide the durable release-engineering backbone.

Pro Git - branching, tags, signing, and release history.
GitHub Actions in Action - workflow and automation support.
Software Engineering at Google - engineering-process and reliability context.

What This Concept Is​

Why It Matters Here​

Concrete Example​

The Dark Launch Pattern​

Flag Service Architecture -- Why Env Vars Are Not Enough​

Common Confusion / Misconception​

How To Use It​

Check Yourself​

Mini Drill or Application​

See also (external)​

Source Backbone​