Skip to main content

Translating Informal Problems into Formal Specifications

What This Concept Is

A formal specification is a precise description of what a solver must produce, written tightly enough that two competent engineers would implement behaviorally identical solutions from it. It has a fixed shape:

  • inputs: types, ranges, pre-conditions
  • outputs: types, post-conditions
  • invariants: things that must stay true across the computation or across the system state
  • constraints: time, memory, throughput, correctness tolerances
  • failure modes: what happens on bad input, missing data, overflow, ties, duplicates
  • determinism policy: is the answer unique, or are multiple answers acceptable?

An informal problem is what you are actually handed: a sentence from a product manager, a chat message from a teammate, a contest prompt in English, a screenshot. Informal problems are almost always under-specified, sometimes over-specified, and often internally inconsistent.

Translation is the disciplined work of turning the informal artifact into the formal one. It is problem-solving in its own right -- a direct extension of Polya's understanding phase into the code world.

The distinction worth fixing early: a specification answers "what would be correct?", a plan answers "how will we produce something correct?", and an implementation answers "what did we actually produce?". Many projects confuse these. A missing or bad spec makes the plan and implementation ungrounded; all disputes become opinion.

Why It Matters Here

Module 5 earlier treats "understand the problem" as a mathematical habit (see concept 1). Once you leave textbooks, the understanding phase no longer happens against a carefully-written problem statement. It happens against ambiguity.

Without a spec:

  • you build the wrong thing, correctly
  • tests pass but reviewers reject the behaviour
  • the next person cannot tell whether an edge case is a bug or a feature
  • you cannot verify your own work, because there is nothing to verify against

The spec is the artifact everything else verifies against. Proofs verify against a theorem statement; code verifies against a spec. Skipping the spec is skipping the only ground truth you have.

Forward pipeline: Semester 2 (algorithm-design chapters open with formal I/O statements), Semester 3 (refactoring preserves a specified contract, not just observed behaviour), Semester 4 (systems programming forces API contracts to be machine-checkable), Semester 5 (distributed-systems protocols are formal specs), Semester 7 (ADR context + decision is a lightweight formal spec at architecture scale), Semester 10 (the capstone hinges on a stable problem statement through scope changes).

Concrete Examples

Example 1 -- "sort fraud transactions"

Informal prompt from a product manager:

Sort the transactions from the last 24 hours so the fraud team sees the suspicious ones first.

This looks like a one-line task. Translate it.

Inputs.

  • a list of Transaction records, each with at least: id: str, timestamp: datetime, amount: Decimal, merchant_id: str, risk_score: float in [0, 1], optional flagged_reason: str?
  • the time window is "last 24 hours" -- but relative to when? Resolve: relative to the system clock at call time, in UTC.

Output. a list of Transaction records, same length as the input, ordered so that "more suspicious" transactions appear first.

What does "suspicious" mean? The informal prompt does not say. Extract it by writing down the smallest decision rule you can defend:

  1. primary key: risk_score descending
  2. tie-break 1: transactions with a non-null flagged_reason come before those without
  3. tie-break 2: larger amount first
  4. tie-break 3: more recent timestamp first
  5. tie-break 4: lexicographic on id (guarantees determinism)

Invariants and constraints.

  • the output is a permutation of the input: no transaction is dropped or duplicated
  • sort must be deterministic: same input always produces the same output (important for replayable audit trails)
  • the function is pure: it does not mutate the input list
  • target latency: under 200 ms for $n \le 100{,}000$ rows

Failure modes.

  • risk_score outside $[0, 1]$: raise InvalidTransaction with the offending id
  • timestamp in the future: include but tag; do not silently drop (this is a business choice, and it must be made)
  • empty input: return empty list, not error

Now "sort transactions for fraud review" is a formal problem. Two engineers will produce behaviorally identical functions. The fraud team can verify it. You can test it.

Example 2 -- "deduplicate the customer list"

Informal prompt. "Deduplicate the customer list before we email them."

First question: what is "duplicate"? Four plausible definitions exist; each yields a different system.

  1. Same customer_id. Trivial; probably not what a human asking this question means.
  2. Same normalised email. Lowercase, strip whitespace, canonicalise +-tags on Gmail? Canonicalise international domain names?
  3. Same name + same address. Requires name canonicalisation (case, punctuation, nicknames) and address normalisation (USPS CASS? a third-party API?).
  4. Same person, fuzzy match. "J. Smith, 123 Main St" vs "John Smith, 123 Main Street"; requires a similarity threshold and a merging policy for conflicting fields.

Each choice produces a very different spec:

DimensionStrict (1)Normalised email (2)Fuzzy (4)
Deterministicyesyesno without tie-break
Performance$O(n)$$O(n)$$O(n^2)$ or $O(n \log n)$ with blocking
False-merge risknonelownon-trivial
Data-quality dependencynoneemail normalisation rulesfull name/address pipeline

The formal spec names the definition ("we will use normalised email, RFC 5321 local-part lowercased, Gmail +-tags stripped"), states the determinism policy ("keep the record with the earliest created_at"), and declares the failure modes ("reject records with empty email"). Without these, the feature ships with silent differences between staging and production.

Common Confusion / Misconceptions

Over-specification. Writing a spec so tight that it constrains the implementation instead of the behaviour. "Use a min-heap" is an implementation choice, not a spec. The spec says what, not how.

Under-specification. Leaving tie-breaking, null handling, or determinism unstated. Every under-specified corner is a future bug argument. Resolve it in the spec or explicitly mark it as "caller's choice."

Accidental non-determinism. Silently accepting Python's dictionary iteration order, floating-point summation order, or wall-clock-dependent inputs without flagging them. Non-determinism is a spec choice; it must be declared, not inherited.

Spec drift. The code evolves; the spec does not. After two features, the spec no longer describes the code. Either the spec is a living document or it is worthless.

Specs for proofs vs. specs for code. A proof spec (theorem statement) must be true; it does not care how you got there. A code spec must additionally be checkable at runtime (types), testable (you can write cases for each post-condition), and performant (constraints declare the cost envelope).

How To Use It

Translation protocol:

  1. Read the informal prompt once, without thinking about solutions.
  2. List every noun. For each, ask: is this a type I can name? What are its fields, ranges, null behaviour?
  3. List every verb. For each, ask: what is the pre-condition, what is the post-condition, what is left unchanged?
  4. Name three concrete tiny examples you expect to work. Write input -> output.
  5. Name three adversarial tiny examples: ties, empty input, single element, duplicate values, out-of-range, nulls.
  6. For every adversarial case, write the intended output. If you cannot, the spec is missing a rule. Add the rule.
  7. State one performance constraint and one determinism constraint explicitly.
  8. Re-read the spec. If a reviewer could implement a behaviorally different but "spec-conforming" solution, tighten it.

This is concept 1's understanding phase executed with an engineering vocabulary.

Transfer / Where This Shows Up Later

  • Semester 2 (algorithms). Every algorithm is introduced with a formal I/O contract. Without it, correctness claims are meaningless.
  • Semester 3 (refactoring & design patterns). Refactoring preserves a spec; design patterns are specs for recurring structural problems.
  • Semester 4 (systems programming). API contracts (POSIX, system calls) are specifications with machine-checkable pre- and post-conditions.
  • Semester 5 (networks & distributed systems). Protocols -- TCP, HTTP, gRPC -- are prose specifications governing millions of implementations.
  • Semester 7 (architecture). ADRs pair a context (the spec's pre-conditions) with a decision (the behavioural commitment). Fitness functions turn the spec's non-functional clauses into automated checks.
  • Semester 10 (capstone). A stable problem statement is the capstone's single most important artifact; stakeholders mutate prompts weekly, and the spec is your only defence against scope drift.

Check Yourself

  1. Why is "sort the transactions" under-specified, and what is the minimum additional information required to make it specified?
  2. What is the difference between over-specification and under-specification, and why are both harmful?
  3. Why is a spec for code typically more detailed than a spec for a proof? Give two concrete things a code spec must declare that a proof spec may leave implicit.
  4. Give an example of an "accidental non-determinism" bug and describe how the spec should prevent it.
  5. What counts as evidence that a spec is complete? (Hint: think about peer verification.)

Mini Drill or Application

Drill A. Take one of these informal prompts and write a full spec following the eight-step protocol above:

  1. "Show the user their most important notifications at the top."
  2. "Find the shortest delivery route for a fleet of trucks today."
  3. "Produce a daily report of active users."

The spec you write will be longer than the prompt. That is the point. When you have finished, swap with a peer: can they implement it without asking you a single question? If yes, the spec holds.

Drill B. Re-read the fraud-sorting spec in Example 1 and add a new constraint: "the fraud team wants to be able to filter to only the top 100 suspicious transactions". Update the spec. What changes in the I/O? What new failure modes emerge?

Drill C (unseen). You are handed: "detect bot traffic on the signup page." List at least six ambiguities the prompt contains, and for each state the minimal resolution you would propose. Draft the I/O section of the spec with types.

Read This Only If Stuck