Specs, Prompts, and Provenance as Lifecycle Artifacts
The engineer deleted the prompt after the code worked. Six weeks later, nobody could explain why the service accepted a fallback value that contradicted the product spec.
Lifecycle artifacts are the records that let an AI-native team replay why a generated change existed: the spec, prompt, context bundle, diff, eval evidence, owner, and release decision.
The engineer deleted the prompt after the code worked. Six weeks later, nobody could explain why the service accepted a fallback value that contradicted the product spec. The code still existed. The tests still existed. The reasoning that caused the code to exist had evaporated.
In AI-native development, the prompt is not always the source of truth. But the prompt, spec, context bundle, and generated diff together become a lifecycle artifact.
This chapter argues that the maintained artifact moves upstream. If the model writes implementation, the human-maintained specification becomes more important, not less. A prompt without a spec is disposable instruction. A spec without provenance is incomplete history. A code change without a link to intent is difficult to review and dangerous to evolve.
Key Takeaways
- The prompt can guide generation, but the maintained spec should anchor intent.
- Context selection is part of provenance because different context produces different code.
- SLSA-style artifact thinking applies to generated code when failure would matter.
- A missing prompt or context record breaks the incident replay chain.
Research spine
This chapter uses: NIST SP 800-218, Secure Software Development Framework; SLSA: Supply-chain Levels for Software Artifacts; DORA, State of AI-assisted Software Development 2025; OpenAI Codex documentation; Anthropic Claude Code Security.
Intent is not a prompt
A prompt asks a model to do something. A spec says what the system should mean. The two overlap, but they should not be confused. A prompt can include a spec, compress a spec, translate a spec, or reference a spec. It should not silently replace the spec because prompts are often private, ad hoc, and hard to review.
A durable AI-native SDLC stores the spec where the team can version, review, and link it. The prompt may be stored with the PR as provenance. The spec should outlive the generation session.
Context bundles
Generated work depends heavily on context selection. The same request given with different files, examples, database schemas, policy notes, or design docs can produce a different solution. Context is therefore part of provenance. A team should record the important context files or retrieval set used to generate consequential changes.
This is not about preserving every token forever. It is about preserving enough to replay risk. If an incident occurs, can the team determine whether the model saw the policy it should have followed? If not, the lifecycle is missing evidence.
Supply-chain thinking for generated code
SLSA and secure development frameworks focus on artifact integrity, provenance, and tamper resistance. AI-native teams should borrow that mindset. Generated code is not exempt from supply-chain questions. Which tool created it? Which model? Which dependencies were introduced? Which external snippets or suggestions influenced it? Which human accepted it? How was the artifact built and signed?
The answer does not need to be heavyweight for every low-risk change. It must be available for changes whose failure would matter.
Operating table
| Artifact | Purpose | Retention rule | Review owner |
|---|---|---|---|
| Spec | Defines intended behavior and constraints | Long-lived, versioned | Product + engineering owner |
| Prompt | Captures generation instruction | Stored with change when consequential | Submitting engineer |
| Context bundle | Shows what the model was allowed to see | Stored by reference | Tooling / platform |
| Diff | Implementation artifact | Normal source-control retention | Engineering |
| Eval evidence | Proves behavior at gate | Retain with release | Quality owner |
Artifact example: a spec header designed for AI-native traceability
---
intent_id: SPEC-221
owner: payments-platform
risk_class: medium
allowed_context:
- docs/payment-state-machine.md
- docs/refund-policy.md
- src/payments/
generated_change_policy:
store_prompt: true
store_context_references: true
require_behavior_tests: true
---
# Refund Retry Behavior
When a refund provider times out, the system should retry only if the provider has not
acknowledged the refund. A retry must be idempotent and must not create a second refund.
Checklist
- Keep specs in version control.
- Store prompts for consequential generated changes.
- Record context references when generation depends on local files or policies.
- Link eval evidence to release decisions.
- Adopt supply-chain provenance thinking for AI-assisted code.
Takeaway
When the machine writes more implementation, the human-maintained intent record becomes the anchor of the lifecycle.
Operational note: Prompts are ephemeral unless governed
Many prompts are written like scratch notes but used like engineering instructions. In the context of Specs, Prompts, and Provenance as Lifecycle Artifacts, the practical danger is not that the team lacks effort; it is that effort is aimed at the wrong scarce resource. The AI-native software delivery argument says that the old visible unit of work is no longer the safest unit of management. A team can produce more drafts, more code, more messages, more analysis, or more tickets while becoming less reliable at the point where the business needs a decision. The fix is to move the management surface away from raw output and toward evidence: what was decided, by whom, from which inputs, against which criteria, with what rollback path.
A mature implementation treats this as an operating-system concern rather than a personal-performance concern. The artifact should make the judgment visible: the rubric, acceptance gate, cost line, risk boundary, owner, and expiry date. When those fields are missing, the model's speed hides organizational ambiguity. When they are present, AI acceleration becomes tractable because the team can see which decisions deserve automation, which deserve human review, and which deserve rejection before execution begins.
The useful test is whether a new teammate can replay the decision two weeks later without interviewing the original author. If replay requires folklore, the process is still human-memory-bound. If replay can be done from the artifact, the team has converted judgment into infrastructure. That conversion is the recurring discipline throughout this book: not replacing human judgment, but making human judgment explicit enough that machines can safely do more of the surrounding work.
Field expansion: Context selection is a design decision
The model's answer reflects what it was allowed to see and what it was not allowed to see. In the context of Specs, Prompts, and Provenance as Lifecycle Artifacts, the practical danger is not that the team lacks effort; it is that effort is aimed at the wrong scarce resource. The AI-native software delivery argument says that the old visible unit of work is no longer the safest unit of management. A team can produce more drafts, more code, more messages, more analysis, or more tickets while becoming less reliable at the point where the business needs a decision. The fix is to move the management surface away from raw output and toward evidence: what was decided, by whom, from which inputs, against which criteria, with what rollback path.
A mature implementation treats this as an operating-system concern rather than a personal-performance concern. The artifact should make the judgment visible: the rubric, acceptance gate, cost line, risk boundary, owner, and expiry date. When those fields are missing, the model's speed hides organizational ambiguity. When they are present, AI acceleration becomes tractable because the team can see which decisions deserve automation, which deserve human review, and which deserve rejection before execution begins.
The useful test is whether a new teammate can replay the decision two weeks later without interviewing the original author. If replay requires folklore, the process is still human-memory-bound. If replay can be done from the artifact, the team has converted judgment into infrastructure. That conversion is the recurring discipline throughout this book: not replacing human judgment, but making human judgment explicit enough that machines can safely do more of the surrounding work.
Design consequence: Provenance is cheaper before the incident
Trying to reconstruct an AI-assisted change after failure is far more expensive than recording the path during creation. In the context of Specs, Prompts, and Provenance as Lifecycle Artifacts, the practical danger is not that the team lacks effort; it is that effort is aimed at the wrong scarce resource. The AI-native software delivery argument says that the old visible unit of work is no longer the safest unit of management. A team can produce more drafts, more code, more messages, more analysis, or more tickets while becoming less reliable at the point where the business needs a decision. The fix is to move the management surface away from raw output and toward evidence: what was decided, by whom, from which inputs, against which criteria, with what rollback path.
A mature implementation treats this as an operating-system concern rather than a personal-performance concern. The artifact should make the judgment visible: the rubric, acceptance gate, cost line, risk boundary, owner, and expiry date. When those fields are missing, the model's speed hides organizational ambiguity. When they are present, AI acceleration becomes tractable because the team can see which decisions deserve automation, which deserve human review, and which deserve rejection before execution begins.
The useful test is whether a new teammate can replay the decision two weeks later without interviewing the original author. If replay requires folklore, the process is still human-memory-bound. If replay can be done from the artifact, the team has converted judgment into infrastructure. That conversion is the recurring discipline throughout this book: not replacing human judgment, but making human judgment explicit enough that machines can safely do more of the surrounding work.
Managerial implication: Prompts are ephemeral unless governed
Many prompts are written like scratch notes but used like engineering instructions. In the context of Specs, Prompts, and Provenance as Lifecycle Artifacts, the practical danger is not that the team lacks effort; it is that effort is aimed at the wrong scarce resource. The AI-native software delivery argument says that the old visible unit of work is no longer the safest unit of management. A team can produce more drafts, more code, more messages, more analysis, or more tickets while becoming less reliable at the point where the business needs a decision. The fix is to move the management surface away from raw output and toward evidence: what was decided, by whom, from which inputs, against which criteria, with what rollback path.
A mature implementation treats this as an operating-system concern rather than a personal-performance concern. The artifact should make the judgment visible: the rubric, acceptance gate, cost line, risk boundary, owner, and expiry date. When those fields are missing, the model's speed hides organizational ambiguity. When they are present, AI acceleration becomes tractable because the team can see which decisions deserve automation, which deserve human review, and which deserve rejection before execution begins.
The useful test is whether a new teammate can replay the decision two weeks later without interviewing the original author. If replay requires folklore, the process is still human-memory-bound. If replay can be done from the artifact, the team has converted judgment into infrastructure. That conversion is the recurring discipline throughout this book: not replacing human judgment, but making human judgment explicit enough that machines can safely do more of the surrounding work.
Production implication: Context selection is a design decision
The model's answer reflects what it was allowed to see and what it was not allowed to see. In the context of Specs, Prompts, and Provenance as Lifecycle Artifacts, the practical danger is not that the team lacks effort; it is that effort is aimed at the wrong scarce resource. The AI-native software delivery argument says that the old visible unit of work is no longer the safest unit of management. A team can produce more drafts, more code, more messages, more analysis, or more tickets while becoming less reliable at the point where the business needs a decision. The fix is to move the management surface away from raw output and toward evidence: what was decided, by whom, from which inputs, against which criteria, with what rollback path.
A mature implementation treats this as an operating-system concern rather than a personal-performance concern. The artifact should make the judgment visible: the rubric, acceptance gate, cost line, risk boundary, owner, and expiry date. When those fields are missing, the model's speed hides organizational ambiguity. When they are present, AI acceleration becomes tractable because the team can see which decisions deserve automation, which deserve human review, and which deserve rejection before execution begins.
The useful test is whether a new teammate can replay the decision two weeks later without interviewing the original author. If replay requires folklore, the process is still human-memory-bound. If replay can be done from the artifact, the team has converted judgment into infrastructure. That conversion is the recurring discipline throughout this book: not replacing human judgment, but making human judgment explicit enough that machines can safely do more of the surrounding work.
