Name: The Spec Is the Program
Availability: InStock

Research spine: this chapter stays grounded in NIST AI Risk Management Framework and NIST Secure Software Development Framework, then applies that evidence to the operating judgment in the book. Most arguments about requirements disappear when people write examples.

A product manager says, "Admins should see all reports." Security says, "Only reports for their department." Customer success says, "Enterprise admins expect cross-department visibility." Engineering says, "The current role model cannot express that." Legal says, "EU tenant data cannot be visible to US-based support admins." Everyone thought they agreed until an example forced the hidden conflict into daylight.

Examples are where intent becomes testable. They are also where AI-assisted development becomes safer. A model can generate an implementation from prose, but examples give it anchors. Reviewers can debate examples more productively than abstractions. Test suites can preserve examples after the meeting ends.

Specification by Example is not new. Gojko Adzic's work popularized the practice of using concrete examples to build shared understanding and drive acceptance tests. The reason it belongs in this book is that AI makes examples more valuable. A model can transform examples into tests, fixtures, UI states, API calls, and documentation. The better the examples, the safer the generation.

A good example has five parts:

context;
action;
expected behavior;
negative or edge condition;
why the example matters.

For the admin reports conflict, examples reveal the domain:

Feature: Report visibility for tenant admins

Scenario: Department admin sees only department reports
 Given user Priya is an admin for the "Finance" department
 And report R1 belongs to "Finance"
 And report R2 belongs to "Sales"
 When Priya opens the reports page
 Then she sees R1
 And she does not see R2

Scenario: Enterprise admin sees all departments in same tenant
 Given user Mateo is an enterprise admin for tenant T1
 And report R1 belongs to Finance in T1
 And report R2 belongs to Sales in T1
 When Mateo opens the reports page
 Then he sees R1 and R2

Scenario: Support admin cannot cross data residency boundary
 Given support admin Lena is based in the US
 And tenant T2 is EU-resident
 When Lena searches for reports in T2
 Then no report content is displayed
 And an access-denied audit event is recorded

These examples do more than illustrate. They constrain. They show that "admin" is not one role. They expose tenant, department, and data residency. They create fixtures. They can become automated tests. They tell an AI coding agent which conditions matter.

Whiteboard-style technical sketch infographic for Examples Are the Executable Core. — Examples turn vague stakeholder agreement into concrete cases that can be tested and reviewed.

Counterexamples are equally important. They define what the system must not do. AI-generated code often implements the happy path. Counterexamples protect the boundary.

For a coupon system:

Example type	Case	Expected result
Happy path	Valid coupon on eligible plan	Discount applied
Counterexample	Expired coupon	Rejected with reason
Counterexample	Coupon for annual plan used on monthly plan	Rejected
Edge case	Coupon applied at renewal boundary	Applied according to invoice date rule
Abuse case	Same coupon attempted across tenants	Rejected and audited
Compatibility case	Legacy coupon without campaign ID	Supported until migration date

The counterexamples are not pessimism. They are the behavioral perimeter.

Examples also support AI review. A reviewer can ask: which example does this generated code satisfy? Which counterexample does it violate? If the diff does not map to examples, the reviewer is forced back into subjective reading. Example traceability reduces review burden because it lets the reviewer test intent rather than infer it.

A traceability matrix:

Outcome	Example	Code area	Test	Owner
Department admin sees only own reports	Department admin scenario	`ReportPolicy.visible_reports`	`test_department_admin_scope`	Security + Product
Enterprise admin sees tenant-wide reports	Enterprise admin scenario	`ReportPolicy.visible_reports`	`test_enterprise_admin_scope`	Product
Residency boundary holds	Support admin EU scenario	`SupportAccessService`	`test_support_residency_denied`	Legal + Security

This matrix looks simple. It is powerful because it gives the model, reviewer, and future maintainer a map from intent to implementation.

Examples should include data, not only prose. A fixture can be more precise than a paragraph. For a tax calculation, a table of input invoices and expected outputs may define behavior better than any description. For a data pipeline, a before/after dataset can lock transformation semantics. For a recommender, a set of queries and expected item rankings can define retrieval quality. For an AI support assistant, examples can include prompt, retrieved evidence, acceptable answer, unacceptable answer, and policy citation.

A compact fixture:

{
 "account": {"tenant": "T1", "currency": "USD"},
 "invoice": {"subtotal_cents": 10000, "tax_region": "CA"},
 "coupon": {"type": "percent", "value": 10, "applies_to": "subtotal"},
 "expected": {
 "discount_cents": 1000,
 "taxable_subtotal_cents": 9000,
 "explanation": "Coupon applies before tax"
 }
}

A model can generate code from this. A test can verify code against it. A reviewer can discuss whether the business rule is right. The fixture becomes a shared artifact.

Examples should be maintained. Production incidents often reveal missing examples. If a customer finds a bug around annual contracts, do not merely patch the code. Add the example. If a reviewer catches a generated implementation that leaks cross-tenant data, add the counterexample. If a support team discovers that a workflow behaves differently for suspended accounts, add the state. The example library is the team's operational memory.

There is a risk: examples can overfit. A system can pass the listed examples and fail the underlying rule. That is why examples must be paired with constraints and properties. But examples remain the best starting point because they make the rule concrete. They are not the whole spec. They are the executable core.

Key Takeaways

Most arguments about requirements disappear when people write examples.

The practical test is whether a team can name the evidence, owner, and failure mode before it changes behavior.

Read this with The Spec Is the Program and the adjacent chapters when you need the wider AI SDLC and Specs frame.

Example quality

A weak example is vague in miniature. "Given a customer, when they cancel, then cancellation works" does not help. It hides role, contract type, invoice state, payment provider, and timing. A strong example names the state that changes behavior. It includes values. It can fail.

Examples also need ownership. Product owns customer behavior. Security owns abuse cases. Platform owns performance fixtures. Support owns real-world exception cases. The spec owner coordinates, but the examples should reflect cross-functional knowledge. That is how the spec captures reality the model cannot infer.

The chapter's rule: every consequential spec should include examples before implementation begins. If the team cannot write examples, it has not yet agreed on behavior. Do not ask the model to decide for you.

Tables are often better than prose

For business rules, tables can be the cleanest example format. A pricing feature may have dozens of combinations: plan type, region, coupon, renewal status, tax treatment, customer segment. Prose becomes unreadable. A table makes coverage visible.

Plan	Coupon	Renewal state	Expected behavior
Monthly self-serve	Percent coupon	New purchase	Apply before tax
Monthly self-serve	Expired coupon	New purchase	Reject with reason
Annual enterprise	Any self-serve coupon	Renewal	Reject and route to account team
Monthly self-serve	Fixed credit	Mid-cycle upgrade	Apply to prorated subtotal only
Suspended account	Any coupon	Reactivation	Block until billing issue resolved

The table is not a spreadsheet for its own sake. It is a compact set of examples that can become tests. A model can generate parameterized tests from it. Reviewers can see missing cases. Product can approve behavior without reading code.

Real examples beat invented examples

Synthetic examples are useful, but production examples are better. Support tickets, incident records, customer escalations, bug reports, failed sales promises, and reviewer disagreements should feed the example library. The best examples often come from the cases the team wishes were rare. They are the boundary where system behavior matters most.

A team can maintain a "golden examples" folder:

/specs/billing/examples/
 cancellation-happy-path.json
 cancellation-unpaid-invoice.json
 cancellation-enterprise-contract.json
 cancellation-provider-timeout.json
 cancellation-legal-hold.json

Each file becomes test data, documentation, and model context. It also becomes a negotiation artifact. If product wants to change enterprise cancellation behavior, it changes the example first, then the implementation follows.

Example review as cross-functional alignment

Examples should be reviewed by the people who own the consequence. Security reviews abuse cases. Support reviews customer states. Finance reviews billing examples. Product reviews user outcomes. Engineering reviews whether examples are implementable and testable. This does not require a meeting for every small change, but consequential features deserve example review before code exists.

The model can generate proposed examples, but humans must approve them. Otherwise the model may define correctness for the organization.

The Example Library

A mature AI-native team accumulates an example library the way a conventional team accumulates unit tests. The library is not a random collection of sample inputs. It is a curated institutional memory of how the product is supposed to behave. It contains happy paths, edge cases, historical regressions, abuse cases, localization cases, migration cases, and customer-specific variants. It is one of the most valuable artifacts in the repository because it turns product judgment into executable evidence.

The first category is the canonical happy path. These examples teach the model and the human reviewer what normal looks like. They should be boring and representative. The second category is the boundary path: empty carts, expired subscriptions, missing fields, large uploads, old browsers, disabled accounts, canceled invoices. Boundary examples are where generic generated code most often fails because the common path looked complete.

The third category is the regression path. Every production incident that involved behavior mismatch should leave behind at least one example. If a generated change once allowed a user to see an unauthorized document, that exact shape of request belongs in the library. If a refactor once dropped a tax field from an invoice export, that export becomes a fixture. This is how the system remembers pain.

The fourth category is the abuse path. AI-generated systems often implement the cooperative version of a feature and omit the adversarial one. The example library should include users trying to exceed limits, change another tenant's data, bypass approvals, inject instructions into free-text fields, trigger double execution, and exploit race conditions. These examples are especially important because many models are trained on tutorials and clean examples, not hostile production traffic.

The fifth category is the business-rule path. These are examples that make sense only inside the company's domain. A healthcare workflow might include state-specific consent rules. A fintech workflow might include transaction holds and reporting thresholds. A marketplace might include seller suspension states and regional tax rules. These examples are where the company's judgment becomes hard to copy.

Examples should be written in formats that tools can consume: JSON fixtures, Gherkin scenarios, snapshot tests, structured YAML cases, or compact Markdown tables that can be converted into tests. The format matters less than the discipline: examples are versioned, reviewed, owned, and expanded after incidents. A spec without examples is negotiable. A spec with examples begins to execute.

From Examples to Generation Context

The example library should not wait passively for tests to run. It should be part of generation context. When asking a model to implement behavior, provide the relevant examples in compact form: representative inputs, expected outputs, and the reason each example exists. The reason field matters because it teaches intent. "Regression from incident INC-1842" carries different weight than "normal case." "Abuse case: cross-tenant access attempt" tells the model not merely what output to produce, but what boundary the output protects.

Teams should resist dumping the whole example library into every request. Context stuffing creates noise and cost, and it can cause the model to imitate irrelevant behavior. Instead, examples should be retrieved by domain, risk, and artifact type. A billing change gets invoice, tax, refund, and subscription examples. A permission change gets authorization examples and abuse cases. A UI copy change does not need database migration fixtures.

The library also supports review. A reviewer can ask: which examples prove this change behaves as intended? Which examples were added because of this change? Which historical examples still pass? A generated diff without a changed example is not always wrong, but a generated behavior change without example evidence should make the reviewer uneasy.

Over time, example coverage becomes a product asset. Competitors can copy architecture diagrams and use the same coding tools. They cannot easily copy the accumulated boundary cases produced by your customers, your incidents, your regulations, and your operational history. The example library is therefore not only a testing artifact. It is a machine-readable form of company knowledge.

Examples Are the Executable Core