AN Alpesh Nakrani
BlogBooksPraiseAbout Work with me →
Book overview
Chapter 2 / The AI-Native Canon

Specs, Prompts, and Provenance as Lifecycle Artifacts

The engineer deleted the prompt after the code worked. Six weeks later, nobody could explain why the service accepted a fallback value that contradicted the product spec.

Lifecycle artifacts are the records that let an AI-native team replay why a generated change existed: the spec, prompt, context bundle, diff, eval evidence, owner, and release decision.

The engineer deleted the prompt after the code worked. Six weeks later, nobody could explain why the service accepted a fallback value that contradicted the product spec. The code still existed. The tests still existed. The reasoning that caused the code to exist had evaporated.

In AI-native development, the prompt is not always the source of truth. But the prompt, spec, context bundle, and generated diff together become a lifecycle artifact.

This chapter argues that the maintained artifact moves upstream. If the model writes implementation, the human-maintained specification becomes more important, not less. A prompt without a spec is disposable instruction. A spec without provenance is incomplete history. A code change without a link to intent is difficult to review and dangerous to evolve.

Key Takeaways

  • The prompt can guide generation, but the maintained spec should anchor intent.
  • Context selection is part of provenance because different context produces different code.
  • SLSA-style artifact thinking applies to generated code when failure would matter.
  • A missing prompt or context record breaks the incident replay chain.

Research spine

This chapter uses: NIST SP 800-218, Secure Software Development Framework; SLSA: Supply-chain Levels for Software Artifacts; DORA, State of AI-assisted Software Development 2025; OpenAI Codex documentation; Anthropic Claude Code Security.

Intent is not a prompt

A prompt asks a model to do something. A spec says what the system should mean. The two overlap, but they should not be confused. A prompt can include a spec, compress a spec, translate a spec, or reference a spec. It should not silently replace the spec because prompts are often private, ad hoc, and hard to review.

A durable AI-native SDLC stores the spec where the team can version, review, and link it. The prompt may be stored with the PR as provenance. The spec should outlive the generation session.

Context bundles

Generated work depends heavily on context selection. The same request given with different files, examples, database schemas, policy notes, or design docs can produce a different solution. Context is therefore part of provenance. A team should record the important context files or retrieval set used to generate consequential changes.

This is not about preserving every token forever. It is about preserving enough to replay risk. If an incident occurs, can the team determine whether the model saw the policy it should have followed? If not, the lifecycle is missing evidence.

Supply-chain thinking for generated code

SLSA and secure development frameworks focus on artifact integrity, provenance, and tamper resistance. AI-native teams should borrow that mindset. Generated code is not exempt from supply-chain questions. Which tool created it? Which model? Which dependencies were introduced? Which external snippets or suggestions influenced it? Which human accepted it? How was the artifact built and signed?

The answer does not need to be heavyweight for every low-risk change. It must be available for changes whose failure would matter.

Operating table

ArtifactPurposeRetention ruleReview owner
SpecDefines intended behavior and constraintsLong-lived, versionedProduct + engineering owner
PromptCaptures generation instructionStored with change when consequentialSubmitting engineer
Context bundleShows what the model was allowed to seeStored by referenceTooling / platform
DiffImplementation artifactNormal source-control retentionEngineering
Eval evidenceProves behavior at gateRetain with releaseQuality owner

Artifact example: a spec header designed for AI-native traceability

---
intent_id: SPEC-221
owner: payments-platform
risk_class: medium
allowed_context:
 - docs/payment-state-machine.md
 - docs/refund-policy.md
 - src/payments/
generated_change_policy:
 store_prompt: true
 store_context_references: true
 require_behavior_tests: true
---

# Refund Retry Behavior

When a refund provider times out, the system should retry only if the provider has not
acknowledged the refund. A retry must be idempotent and must not create a second refund.
Artifact chain from spec to prompt, context bundle, generated diff, eval evidence, and release with a broken lost-prompt link
Specs, prompts, context, generated diffs, eval evidence, and release records form the replayable chain; a lost prompt breaks it.

Checklist

  • Keep specs in version control.
  • Store prompts for consequential generated changes.
  • Record context references when generation depends on local files or policies.
  • Link eval evidence to release decisions.
  • Adopt supply-chain provenance thinking for AI-assisted code.

Takeaway

When the machine writes more implementation, the human-maintained intent record becomes the anchor of the lifecycle.

Operational note: Prompts are ephemeral unless governed

Many prompts are written like scratch notes but used like engineering instructions. In the context of Specs, Prompts, and Provenance as Lifecycle Artifacts, the practical danger is not that the team lacks effort; it is that effort is aimed at the wrong scarce resource. The AI-native software delivery argument says that the old visible unit of work is no longer the safest unit of management. A team can produce more drafts, more code, more messages, more analysis, or more tickets while becoming less reliable at the point where the business needs a decision. The fix is to move the management surface away from raw output and toward evidence: what was decided, by whom, from which inputs, against which criteria, with what rollback path.

A mature implementation treats this as an operating-system concern rather than a personal-performance concern. The artifact should make the judgment visible: the rubric, acceptance gate, cost line, risk boundary, owner, and expiry date. When those fields are missing, the model's speed hides organizational ambiguity. When they are present, AI acceleration becomes tractable because the team can see which decisions deserve automation, which deserve human review, and which deserve rejection before execution begins.

The useful test is whether a new teammate can replay the decision two weeks later without interviewing the original author. If replay requires folklore, the process is still human-memory-bound. If replay can be done from the artifact, the team has converted judgment into infrastructure. That conversion is the recurring discipline throughout this book: not replacing human judgment, but making human judgment explicit enough that machines can safely do more of the surrounding work.

Field expansion: Context selection is a design decision

The model's answer reflects what it was allowed to see and what it was not allowed to see. In the context of Specs, Prompts, and Provenance as Lifecycle Artifacts, the practical danger is not that the team lacks effort; it is that effort is aimed at the wrong scarce resource. The AI-native software delivery argument says that the old visible unit of work is no longer the safest unit of management. A team can produce more drafts, more code, more messages, more analysis, or more tickets while becoming less reliable at the point where the business needs a decision. The fix is to move the management surface away from raw output and toward evidence: what was decided, by whom, from which inputs, against which criteria, with what rollback path.

A mature implementation treats this as an operating-system concern rather than a personal-performance concern. The artifact should make the judgment visible: the rubric, acceptance gate, cost line, risk boundary, owner, and expiry date. When those fields are missing, the model's speed hides organizational ambiguity. When they are present, AI acceleration becomes tractable because the team can see which decisions deserve automation, which deserve human review, and which deserve rejection before execution begins.

The useful test is whether a new teammate can replay the decision two weeks later without interviewing the original author. If replay requires folklore, the process is still human-memory-bound. If replay can be done from the artifact, the team has converted judgment into infrastructure. That conversion is the recurring discipline throughout this book: not replacing human judgment, but making human judgment explicit enough that machines can safely do more of the surrounding work.

Design consequence: Provenance is cheaper before the incident

Trying to reconstruct an AI-assisted change after failure is far more expensive than recording the path during creation. In the context of Specs, Prompts, and Provenance as Lifecycle Artifacts, the practical danger is not that the team lacks effort; it is that effort is aimed at the wrong scarce resource. The AI-native software delivery argument says that the old visible unit of work is no longer the safest unit of management. A team can produce more drafts, more code, more messages, more analysis, or more tickets while becoming less reliable at the point where the business needs a decision. The fix is to move the management surface away from raw output and toward evidence: what was decided, by whom, from which inputs, against which criteria, with what rollback path.

A mature implementation treats this as an operating-system concern rather than a personal-performance concern. The artifact should make the judgment visible: the rubric, acceptance gate, cost line, risk boundary, owner, and expiry date. When those fields are missing, the model's speed hides organizational ambiguity. When they are present, AI acceleration becomes tractable because the team can see which decisions deserve automation, which deserve human review, and which deserve rejection before execution begins.

The useful test is whether a new teammate can replay the decision two weeks later without interviewing the original author. If replay requires folklore, the process is still human-memory-bound. If replay can be done from the artifact, the team has converted judgment into infrastructure. That conversion is the recurring discipline throughout this book: not replacing human judgment, but making human judgment explicit enough that machines can safely do more of the surrounding work.

Managerial implication: Prompts are ephemeral unless governed

Many prompts are written like scratch notes but used like engineering instructions. In the context of Specs, Prompts, and Provenance as Lifecycle Artifacts, the practical danger is not that the team lacks effort; it is that effort is aimed at the wrong scarce resource. The AI-native software delivery argument says that the old visible unit of work is no longer the safest unit of management. A team can produce more drafts, more code, more messages, more analysis, or more tickets while becoming less reliable at the point where the business needs a decision. The fix is to move the management surface away from raw output and toward evidence: what was decided, by whom, from which inputs, against which criteria, with what rollback path.

A mature implementation treats this as an operating-system concern rather than a personal-performance concern. The artifact should make the judgment visible: the rubric, acceptance gate, cost line, risk boundary, owner, and expiry date. When those fields are missing, the model's speed hides organizational ambiguity. When they are present, AI acceleration becomes tractable because the team can see which decisions deserve automation, which deserve human review, and which deserve rejection before execution begins.

The useful test is whether a new teammate can replay the decision two weeks later without interviewing the original author. If replay requires folklore, the process is still human-memory-bound. If replay can be done from the artifact, the team has converted judgment into infrastructure. That conversion is the recurring discipline throughout this book: not replacing human judgment, but making human judgment explicit enough that machines can safely do more of the surrounding work.

Production implication: Context selection is a design decision

The model's answer reflects what it was allowed to see and what it was not allowed to see. In the context of Specs, Prompts, and Provenance as Lifecycle Artifacts, the practical danger is not that the team lacks effort; it is that effort is aimed at the wrong scarce resource. The AI-native software delivery argument says that the old visible unit of work is no longer the safest unit of management. A team can produce more drafts, more code, more messages, more analysis, or more tickets while becoming less reliable at the point where the business needs a decision. The fix is to move the management surface away from raw output and toward evidence: what was decided, by whom, from which inputs, against which criteria, with what rollback path.

A mature implementation treats this as an operating-system concern rather than a personal-performance concern. The artifact should make the judgment visible: the rubric, acceptance gate, cost line, risk boundary, owner, and expiry date. When those fields are missing, the model's speed hides organizational ambiguity. When they are present, AI acceleration becomes tractable because the team can see which decisions deserve automation, which deserve human review, and which deserve rejection before execution begins.

The useful test is whether a new teammate can replay the decision two weeks later without interviewing the original author. If replay requires folklore, the process is still human-memory-bound. If replay can be done from the artifact, the team has converted judgment into infrastructure. That conversion is the recurring discipline throughout this book: not replacing human judgment, but making human judgment explicit enough that machines can safely do more of the surrounding work.

Share