AN Alpesh Nakrani
BlogBooksPraiseAbout Work with me →
Book overview
Chapter 2 / The AI-Native Canon

A Prompt Is Not a Spec

A prompt is a request. A spec is a contract.

Research spine: this chapter stays grounded in NIST AI Risk Management Framework and NIST Secure Software Development Framework, then applies that evidence to the operating judgment in the book. A prompt is a request. A spec is a contract.

The difference seems obvious until a team starts using AI coding tools every day. A product manager writes, "Build a dashboard that shows revenue by region." An engineer writes, "Refactor this service to make it cleaner." A founder writes, "Add onboarding with Stripe and email verification." The model responds with files, routes, components, migrations, tests, and confidence. The prompt feels like it became software. But the prompt did not define the behavior. It merely asked for an implementation-shaped guess.

A prompt disappears into a conversation. A spec survives the conversation. It can be reviewed before code exists. It can be versioned. It can be diffed. It can be tested. It can be attached to a pull request. It can be cited during an incident. It can be updated when production teaches the team that intent was incomplete. A prompt is often private context. A spec is shared memory.

This matters because AI makes it easy to confuse fluency with agreement. If a model gives a confident implementation, the team may feel that the requirement was understood. But understanding cannot be inferred from generated volume. The only reliable evidence is whether the implementation satisfies explicit acceptance criteria.

A proper spec contains at least seven layers of intent:

  1. Outcome - what result should exist?
  2. Actors - who uses or is affected by it?
  3. State - what conditions change behavior?
  4. Boundaries - what is in and out of scope?
  5. Examples - what concrete cases define correctness?
  6. Constraints - what must remain true across implementations?
  7. Acceptance - what evidence decides whether work is done?

A prompt can include these layers, but most prompts do not. More importantly, a prompt embedded in a chat is not enough unless the layers become a maintained artifact.

Consider the instruction: "Add a cancel subscription button." A model can implement a button quickly. But the real spec asks: who can cancel, admin or billing owner; can cancellation be immediate or end-of-period; what happens to annual contracts; what happens to outstanding invoices; what if the account is under legal hold; what email is sent; what analytics event is tracked; what retention offer is shown; what audit event is created; what happens if payment provider cancellation succeeds but local database update fails; what support queue sees the event; what translations are required. The prompt asks for a button. The spec defines a business process.

This is why "prompt engineering" is not a substitute for requirements engineering. Prompting can help elicit, draft, and transform specs. It can generate candidate examples, identify missing edge cases, and turn acceptance criteria into tests. But the maintained artifact should not be a clever incantation. It should be the behavioral definition of the system.

Requirements engineering has always dealt with ambiguity, conflict, incompleteness, and change. Karl Wiegers' software requirements work, Gojko Adzic's Specification by Example, and domain-driven design all emphasize that shared language and examples reduce misunderstandings before implementation. AI adds a new reason: the implementation can now be produced too quickly for ambiguity to be noticed through manual effort. Specification becomes the friction that protects correctness.

The distinction can be operationalized:

Prompt behaviorSpec behavior
Requests an outputDefines acceptable behavior
Often conversational and ephemeralVersioned and maintained
Can be satisfied plausiblyMust be tested or reviewed explicitly
May hide assumptionsMakes assumptions inspectable
Optimizes generationOptimizes acceptance
Owned by the person promptingOwned by the team/system
Useful for explorationRequired for durable implementation
Whiteboard-style technical sketch infographic for A Prompt Is Not a Spec.
A prompt asks for output; a spec defines outcomes, boundaries, examples, constraints, and acceptance evidence.

A spec can be lightweight. The opposite of vague is not necessarily long. A good spec is precise where consequence lives. A one-paragraph change to an internal color token may be enough. A public API change needs examples, compatibility expectations, error states, and versioning. A billing workflow needs permission, audit, and financial constraints. A medical triage tool needs clinical boundaries, escalation, and risk controls. The spec's weight should follow consequence and ambiguity, not management habit.

The spec should also state non-goals. AI coding tools are prone to helpful expansion. If asked to implement onboarding, the model may add password reset, profile setup, invite flows, marketing preferences, or analytics dashboards because those patterns are nearby. Helpful expansion is dangerous when scope matters. Non-goals constrain the model and the humans reviewing it. They say: do not solve adjacent problems in this change.

A useful spec skeleton:

# Spec: Subscription Cancellation

> **Key Takeaways**
> - A prompt is a request. A spec is a contract.
> - The practical test is whether a team can name the evidence, owner, and failure mode before it changes behavior.
> - Read this with The Spec Is the Program and the adjacent chapters when you need the wider AI SDLC and Specs frame.

## Outcome
A billing owner can schedule subscription cancellation at period end without contacting support.

## In scope
- Self-serve cancellation for monthly self-serve plans.
- Cancellation reason capture.
- Audit event and confirmation email.

## Out of scope
- Enterprise contracts.
- Immediate refunds.
- Accounts under legal hold.
- Custom annual invoices.

## Actors and permissions
Only users with `billing_owner` role may cancel.

## Examples
Given a monthly account in good standing...
When the billing owner confirms cancellation...
Then the subscription is marked `cancel_at_period_end=true`.

## Constraints
- No cancellation if unpaid invoice exists.
- Payment provider and local state must remain consistent.
- All actions must create audit events.
- Support dashboard must show scheduled cancellation.

## Acceptance
Integration tests cover successful cancellation, unpaid invoice, wrong role, payment-provider failure, and audit event.

This is not a prompt. It is a target.

The spec can then be used as context for the model. The difference is that the model is no longer being asked to infer the system from a wish. It is being asked to implement against a contract. If the generated code fails, review can point to a spec line. If the spec is wrong, the team can update the spec and record why. If production reveals a missing case, the case becomes a spec drift entry.

A prompt-first team has conversations like: "The model didn't understand what I meant." A spec-first team has conversations like: "The acceptance criteria did not cover payment-provider partial failure." The second conversation is better because it is actionable.

The chapter's practice: before asking a model to implement, ask it to interrogate the prompt.

Example:

You are not implementing yet. Read this proposed feature request.
List missing requirements, permission questions, failure states, data constraints,
compatibility risks, tests needed, and non-goals. Do not write code.

This flips the model from premature builder to specification assistant. The human still owns intent, but the machine helps expose ambiguity. That is a better use of intelligence.

The spec survives the model

A second reason a prompt is not a spec is model turnover. The team may use one coding assistant this month and another next quarter. Models will change behavior. Tools will change context-window policies, agent loops, and repository indexing. If the only durable artifact is a prompt tuned to one tool, the team has tool-specific folklore rather than system knowledge. A spec should outlive the tool that implements it.

This is also why specs should not be written as model instructions only. "Be careful with permissions" is a weak instruction. "Only billing_owner can cancel monthly self-serve plans; enterprise contracts are out of scope; unauthorized users receive 403; every attempt creates an audit event" is a spec. One is advice. The other is behavior.

The final point is cultural. Teams must stop praising the person who turns vague prompts into large diffs and start praising the person who turns vague intent into narrow, testable specs. The first looks faster this week. The second makes every future generation safer.

Prompt quality is not spec quality

A highly detailed prompt can still fail as a spec if it is not maintained, reviewed, and connected to acceptance. Many teams respond to bad AI output by writing longer prompts. Longer prompts can help generation, but they do not solve artifact governance. If the prompt sits in one engineer's chat history, future reviewers cannot inspect it. If it is not versioned, the team cannot know which intent produced which code. If it is not linked to tests, it cannot enforce acceptance.

A good prompt may be derived from a spec, but it should not replace the spec. The spec is the source; the prompt is an execution instruction. The source should be stable enough that different models, tools, or humans can implement it consistently. The prompt can vary by tool.

The acceptance problem

Prompts are also weak because they rarely say how success will be judged. "Make this component accessible" sounds reasonable. A spec names keyboard navigation, focus order, ARIA labels, contrast, screen-reader behavior, and test evidence. "Improve performance" is a wish. A spec says p95 latency under 200 ms for 50 concurrent users and no additional database queries on the hot path. "Handle errors gracefully" is taste. A spec says timeout shows retry state, logs correlation ID, preserves user input, and does not duplicate payment attempts.

Acceptance criteria turn desire into enforceable behavior. They should be written before implementation because they define the target. When acceptance criteria are written after generated code, they often rationalize what the code already does.

Prompts are useful upstream

The correct lesson is not "stop prompting." Use prompts to improve the spec. Ask the model to identify ambiguity, generate example matrices, compare a proposed spec to existing architecture, draft OpenAPI, propose property tests, and challenge missing constraints. This is high-use because the model can explore possibilities quickly while humans retain ownership of intent.

A productive prompt:

Read this draft spec. Do not implement it. Identify missing actors, states,
permissions, failure modes, data constraints, compatibility risks, and acceptance
tests. Return questions grouped by risk severity.

That prompt makes the machine a requirements critic. It prevents wishful thinking from becoming code.

Spec Ownership and Review

The specification needs an owner for the same reason code needs an owner: an artifact without ownership decays. In many early AI-assisted teams, prompts live in chat history, issue comments, or individual notebooks. They are treated as temporary scaffolding used to produce a change. Once the change exists, the prompt disappears. That is acceptable for experiments and dangerous for systems.

A maintained spec has a lifecycle. It is proposed, reviewed, versioned, implemented, tested, and revised. The owner is not always the product manager. Product owns user intent and tradeoffs; engineering owns feasibility and invariants; design owns interaction semantics; security owns trust boundaries; support owns operational reality. The specification is where these concerns meet. A good team therefore treats spec review as multidisciplinary review, not as a writing exercise.

The key rule is that specification review should happen at the level of the decision, not the sentence. Reviewers should not spend their attention polishing phrasing unless phrasing affects behavior. They should ask whether the desired behavior is complete, whether constraints conflict, whether edge cases are named, whether the implementation boundary is clear, and whether the evidence required for acceptance is stated. A spec can be grammatically elegant and operationally useless. A rough spec with clear acceptance examples, failure cases, and non-goals is usually better.

This changes how teams use AI coding tools. The model should not be asked to infer missing policy from vibes. It should be handed a spec that names authority. For example: "Refunds over $500 require supervisor approval" is a rule. "Be careful with large refunds" is an aspiration. "Orders in fraud_review cannot be auto-refunded" is an invariant. "Avoid risky refunds" is not. The model can help expand a rule into code, tests, and documentation, but it cannot safely invent the rule on behalf of the business.

Spec ownership also protects the team from model churn. If the model that writes the implementation changes next quarter, the maintained spec remains. If the framework changes, the spec remains. If a human rewrites the generated code by hand, the spec remains. In an AI-native environment, implementation is increasingly disposable. Intent is not.

Chapter 3: SPEC-Lock

Share