AN Alpesh Nakrani
BlogBooksPraiseAbout Work with me →
Book overview
Chapter 3 / The AI-Native Canon

SPEC-Lock

SPEC Lock is the working method of this book. It is intentionally small enough to remember and strict enough to change behavior.

Research spine: this chapter stays grounded in NIST AI Risk Management Framework and NIST Secure Software Development Framework, then applies that evidence to the operating judgment in the book. SPEC-Lock is the working method of this book. It is intentionally small enough to remember and strict enough to change behavior.

S - State the outcome P - Pin the boundaries E - Encode examples C - Constrain the system Lock - Lock acceptance

The point is not to fill out a template mechanically. The point is to prevent machine implementation from outrunning human intent. SPEC-Lock turns a request into a behavioral contract that a model can implement, a reviewer can inspect, and a test suite can enforce.

Start with outcome. The outcome is not the artifact. "Add a dashboard" is not an outcome. "A finance lead can identify revenue anomalies by region before close" is closer. "Refactor the payment service" is not an outcome. "Payment-provider failures are isolated behind a retryable adapter without changing public API behavior" is closer. The outcome should name the user, system, or business result that should exist after the change.

Pin boundaries next. Boundaries define actors, states, inputs, outputs, systems touched, systems not touched, permissions, failure modes, compatibility expectations, and non-goals. Boundaries are where vague requests become safe. A model may produce plausible code for many adjacent paths; boundaries tell it which paths not to take. Human reviewers also need boundaries because they decide whether a generated diff is overreaching.

Encode examples. Examples are the core of executable intent. They turn abstractions into cases. Good examples include happy paths, edge cases, counterexamples, and fixtures. They should be concrete enough to become tests. Specification by Example is powerful because it makes disagreement visible before code exists. If product, engineering, support, and legal disagree about an example, they would have disagreed in production too.

Constrain the system. Constraints are non-negotiable properties: security, privacy, performance, accessibility, API compatibility, data residency, auditability, design-system consistency, architecture direction, cost budget, and operational ownership. Constraints tell the model that a feature is not correct merely because it works once. It must work without breaking the surrounding system.

Lock acceptance. Acceptance defines how the team knows the implementation is good enough. Tests, evals, code review rubrics, visual checks, performance thresholds, migration rehearsals, rollout gates, and sign-offs all belong here. Acceptance is the lock because it prevents the spec from becoming prose that everyone admires and nobody enforces.

Whiteboard-style technical sketch infographic for SPEC-Lock.
SPEC-Lock turns machine-authored implementation into a reviewable path from outcome to acceptance.

Apply SPEC-Lock to a small API change. Suppose the request is: "Expose customer credit balance through the public API."

A weak prompt:

Add an endpoint to return customer credit balance.

A SPEC-Lock version:

# Spec: Customer Credit Balance API

> **Key Takeaways**
> - SPEC Lock is the working method of this book. It is intentionally small enough to remember and strict enough to change behavior.
> - The practical test is whether a team can name the evidence, owner, and failure mode before it changes behavior.
> - Read this with The Spec Is the Program and the adjacent chapters when you need the wider AI SDLC and Specs frame.

## S - Outcome
A customer billing integration can retrieve the currently available credit balance for one customer account through a stable authenticated API.

## P - Boundaries
In scope:
- Read-only endpoint for existing customer accounts.
- Balance returned in account currency.
- Service accounts with `billing.read` permission.

Out of scope:
- Mutating credit balance.
- Historical credit ledger.
- Enterprise parent-account aggregation.
- UI changes.

Failure states:
- 401 when unauthenticated.
- 403 when authenticated but lacking `billing.read`.
- 404 when customer does not exist in tenant.
- 409 when account currency is not configured.

## E - Examples
Given a tenant account with USD currency and $120.50 credit,
when an authorized service account requests the balance,
then the API returns `available_credit: 12050` and `currency: "USD"`.

Given a service account without `billing.read`,
then the API returns 403 and does not reveal whether the customer exists.

## C - Constraints
- Do not expose cross-tenant customer existence.
- Amounts are integer minor units.
- Endpoint p95 latency under 150 ms.
- Audit all successful and forbidden requests.
- OpenAPI contract must be updated.

## Lock - Acceptance
- OpenAPI diff reviewed.
- Contract tests for 200, 401, 403, 404, 409.
- Tenant isolation integration test.
- Audit-event assertion.
- Load test against 10k sequential reads.

This spec is still short. But the generated implementation has a target. The reviewer has a checklist. The test suite can lock behavior. The API consumer can understand the contract.

OpenAPI is useful here because it gives the spec a machine-readable surface. The OpenAPI Specification defines a language-agnostic interface for HTTP APIs that humans and computers can use to understand a service without inspecting implementation (https://swagger.io/specification/). In AI-assisted development, that property matters even more: the model can generate server code, client code, contract tests, and docs from the same artifact, but only if the artifact expresses the behavior.

A minimal OpenAPI fragment:

paths:
 /v1/customers/{customer_id}/credit-balance:
 get:
 summary: Retrieve available customer credit balance
 security:
 - serviceAccount: [billing.read]
 parameters:
 - name: customer_id
 in: path
 required: true
 schema:
 type: string
 responses:
 "200":
 description: Available credit balance
 content:
 application/json:
 schema:
 type: object
 required: [customer_id, available_credit, currency]
 properties:
 customer_id:
 type: string
 available_credit:
 type: integer
 description: Amount in minor currency units
 currency:
 type: string
 minLength: 3
 maxLength: 3
 "403":
 description: Authenticated principal lacks billing.read
 "409":
 description: Account currency is not configured

The spec becomes a program-like artifact because tools can act on it. But it is still human intent. Tools do not decide whether 403 should hide customer existence. The team decides. The spec preserves that decision.

SPEC-Lock also works outside APIs. For a UI change, examples may be user stories and screenshots. For a data pipeline, examples may be input/output fixtures. For a machine-learning or AI feature, examples may include eval cases and red-team scenarios. For a pricing workflow, examples may include customer segments and edge-case contracts. The framework adapts because it asks what must be true, not which tool must be used.

SPEC-Lock should be scaled by risk. A low-risk internal UI copy change might need one paragraph and a screenshot. A data-deletion workflow needs rigorous boundaries, constraints, and acceptance. A public API needs machine-readable contract. A billing or permission change needs tests for abuse cases. The goal is not process uniformity. The goal is consequence-appropriate precision.

The method also changes code review. Review begins before implementation. A reviewer can reject a spec because the outcome is unclear, boundaries are missing, examples do not cover the risky states, constraints conflict, or acceptance cannot be verified. That is cheaper than rejecting a thousand-line generated diff after the model has filled in ambiguity with code.

SPEC-Lock is especially important when multiple models or agents are involved. One agent may write frontend code, another backend, another tests, another docs. Without a shared spec, each agent optimizes local plausibility. With a shared spec, the team has a single intent artifact that all generated work must satisfy.

SPEC-Lock as a conversation discipline

The framework is also a meeting tool. Instead of letting a planning meeting wander through opinions, the lead can ask five questions in order. What outcome are we creating? What boundaries matter? Which examples prove we agree? Which constraints are non-negotiable? What locks acceptance? The conversation becomes practical quickly because vague agreement cannot survive the examples and constraints sections.

This also helps product and engineering collaborate. Product is not asked to write implementation. Engineering is not asked to guess business intent. Security and legal are not asked to review a finished diff after the path is chosen. Each function contributes to the spec layer where its judgment is cheapest to apply. The model then implements inside that shared artifact.

The final discipline is to keep SPEC-Lock alive after release. Production will reveal missing examples. Customers will use features differently. Incidents will expose constraints no one wrote down. A spec that stops at launch becomes stale documentation. A living SPEC-Lock artifact becomes system memory.

The chapter's takeaway: SPEC-Lock is not a template for writing more documents. It is a control system for turning intent into implementation without letting ambiguity multiply.

The ambiguity budget

SPEC-Lock can be paired with an ambiguity budget. Every unresolved question has a cost. Some questions can remain open because they are low consequence or easy to reverse. Others must be resolved before implementation. The budget makes that explicit.

AmbiguityCan defer?Reason
Button label exact wordingYesEasy to change; low risk
Which role can cancel subscriptionNoPermission and revenue risk
Whether large exports are asyncNoPerformance and reliability risk
Email copy for confirmationMaybeCan ship with approved default
Enterprise contract behaviorNoCommercial/legal risk

This prevents over-specification. The goal is not to eliminate all uncertainty; it is to eliminate expensive uncertainty before machine generation.

SPEC-Lock and autonomy level

The more autonomy a coding agent has, the stronger SPEC-Lock must be. If the model is only producing a small snippet for a human to paste, a lightweight spec may suffice. If the agent can modify files, run tests, commit changes, or open pull requests, the spec must be stronger. If the agent can deploy or trigger migrations, the spec must include rollback, observability, and owner sign-off.

Autonomy without specification is not empowerment. It is permission for ambiguity to act.

SPEC-Lock failure modes

SPEC-Lock itself can be misused. Teams can fill the headings with vague prose. They can write examples that only cover happy paths. They can list constraints but not enforce them. They can lock acceptance with weak tests. The framework works only when each section changes review. A good SPEC-Lock artifact should make at least one implementation path clearly unacceptable. If it does not constrain anything, it is only documentation.

The strongest sign of a useful spec is that it gives the model less freedom in the right places and more clarity everywhere else.

SPEC-Lock During Refactors

Refactors are where the SPEC-Lock framework earns its keep. A refactor claims to change structure without changing behavior. Machine-authored refactors are especially tempting because they can be large, confident, and visually clean. They rename modules, reorganize files, simplify conditionals, extract helpers, and remove "dead" branches faster than a human reviewer can comfortably inspect. The danger is not that the model writes ugly code. The danger is that the model writes plausible code that preserves the common path while erasing a rare but important behavior.

For refactors, the SPEC-Lock emphasis changes. Story and endpoint may stay constant. The load-bearing pieces are examples, constraints, contracts, and known failure cases. The team should require behavior snapshots before and after the refactor. For API code, that means contract tests. For data transformations, it means golden input/output fixtures. For UI workflows, it means interaction traces. For state machines, it means transition tables. For security-sensitive code, it means abuse cases.

A useful refactor spec begins with a negative statement: "This change must not alter externally observable behavior." Then it names the boundaries of "externally observable." The response schema must remain identical. The error codes must remain identical. Permission checks must execute before database writes. Audit events must retain their names and fields. Retry behavior must remain idempotent. These are not implementation details. They are the behavior the refactor must preserve.

The model should be instructed to produce a change plan before code. The plan should list the files affected, the transformations intended, and the invariants the code must preserve. Then the model should produce tests or identify existing tests that protect those invariants. Only after that should it produce the refactor. The order matters. If the model writes the refactor first, the tests may unconsciously validate the new behavior rather than the intended old behavior.

SPEC-Lock also gives reviewers a way to say no without becoming anti-AI. The objection is not "the diff is too big" or "I do not trust generated code." The objection is "the diff changes a contract the spec says must not change," or "the diff removes a failure case the examples say must be preserved." That difference matters culturally. It lets teams reject machine output on evidence rather than taste.

Chapter 4: Examples Are the Executable Core

Share