Incidents, Trust Evidence, and Pruning
The company fixed the incident and kept the feature. Then the same class of failure happened in a different workflow.
The company fixed the incident and kept the feature. Then the same class of failure happened in a different workflow. The postmortem had been written, but the system had not changed. A durable AI organization does not only recover from incidents. It extracts trust evidence and prunes weak surfaces.
Shipping includes stopping.
This chapter connects incident response, customer trust, and product pruning. AI incidents should update evals, scope, runbooks, data contracts, prompts, permissions, and rollout rules. Trust evidence should be collected continuously. Features that cannot be hardened, measured, owned, or justified should be removed or narrowed.
Research spine
This chapter uses: Google SRE Book; NIST AI Risk Management Framework; OWASP Top 10 for Large Language Model Applications; Mitchell et al., Model Cards for Model Reporting; March, Exploration and Exploitation in Organizational Learning.
AI incident reviews
An AI incident review should ask more than what broke. It should ask which assumption failed: scope, data, model, retrieval, prompt, permission, eval, UX, cost, latency, or ownership. It should identify which artifact changes: eval case, prompt version, spec, data contract, runbook, permission rule, dashboard, customer communication, or rollout policy.
Trust evidence packs
Trust evidence is the material a customer, auditor, executive, or support leader needs to believe the system is managed. It can include model cards or system cards, eval summaries, data source lists, permission architecture, incident history, monitoring snapshots, human review process, retention policy, and rollback evidence. The pack should be truthful, not marketing varnish.
Pruning as discipline
AI makes it cheap to add product surfaces. That makes pruning essential. Prune workflows with low usage, high cost, poor quality, unclear owner, weak evidence, repeated incidents, or strategic distraction. Pruning is not failure; it is how durable systems protect focus and trust.
Operating table
| Incident finding | Artifact update | Pruning question |
|---|---|---|
| Out-of-scope answer | Scope/refusal rule | Should this topic be removed? |
| Stale source | Data freshness contract | Should source be excluded until reliable? |
| Prompt injection | Security test and permission gate | Should tool action be narrowed? |
| High cost | Routing/caching/budget rule | Should workflow move async or premium? |
| Repeated low quality | Eval and model/prompt change | Should feature be retired? |
Artifact example: an incident-to-learning artifact
ai_incident_to_learning:
incident_id: "AI-INC-2026-017"
failure_class: "out_of_scope_answer"
customer_impact: "one incorrect policy recommendation"
root_assumption_failed: "scope boundary did not include contractor policy exclusion"
required_updates:
eval_cases:
- "contractor_policy_question"
scope_contract:
- "exclude contractor HR topics"
refusal_copy:
- "route contractors to HR operations"
rollout_policy:
- "pause contractor cohort"
prune_review_required: true
Checklist
- Every AI incident should update at least one artifact.
- Maintain trust evidence continuously.
- Review pruning candidates quarterly.
- Do not let impressive but unowned features survive.
- Treat narrowing as a valid product improvement.
Takeaway
Durable AI products earn trust by changing after incidents and pruning what cannot be responsibly operated.
Operational note: A postmortem is not the artifact
The postmortem should cause changes in tests, evals, scopes, dashboards, policies, or ownership. In the context of Incidents, Trust Evidence, and Pruning, the practical danger is not that the team lacks effort; it is that effort is aimed at the wrong scarce resource. The durable AI product operations argument says that the old visible unit of work is no longer the safest unit of management. A team can produce more drafts, more code, more messages, more analysis, or more tickets while becoming less reliable at the point where the business needs a decision. The fix is to move the management surface away from raw output and toward evidence: what was decided, by whom, from which inputs, against which criteria, with what rollback path.
A mature implementation treats this as an operating-system concern rather than a personal-performance concern. The artifact should make the judgment visible: the rubric, acceptance gate, cost line, risk boundary, owner, and expiry date. When those fields are missing, the model's speed hides organizational ambiguity. When they are present, AI acceleration becomes tractable because the team can see which decisions deserve automation, which deserve human review, and which deserve rejection before execution begins.
The useful test is whether a new teammate can replay the decision two weeks later without interviewing the original author. If replay requires folklore, the process is still human-memory-bound. If replay can be done from the artifact, the team has converted judgment into infrastructure. That conversion is the recurring discipline throughout this book: not replacing human judgment, but making human judgment explicit enough that machines can safely do more of the surrounding work.
Field expansion: Trust evidence must be operationally true
Customers can sense the difference between proof generated from the system and a slide made for procurement. In the context of Incidents, Trust Evidence, and Pruning, the practical danger is not that the team lacks effort; it is that effort is aimed at the wrong scarce resource. The durable AI product operations argument says that the old visible unit of work is no longer the safest unit of management. A team can produce more drafts, more code, more messages, more analysis, or more tickets while becoming less reliable at the point where the business needs a decision. The fix is to move the management surface away from raw output and toward evidence: what was decided, by whom, from which inputs, against which criteria, with what rollback path.
A mature implementation treats this as an operating-system concern rather than a personal-performance concern. The artifact should make the judgment visible: the rubric, acceptance gate, cost line, risk boundary, owner, and expiry date. When those fields are missing, the model's speed hides organizational ambiguity. When they are present, AI acceleration becomes tractable because the team can see which decisions deserve automation, which deserve human review, and which deserve rejection before execution begins.
The useful test is whether a new teammate can replay the decision two weeks later without interviewing the original author. If replay requires folklore, the process is still human-memory-bound. If replay can be done from the artifact, the team has converted judgment into infrastructure. That conversion is the recurring discipline throughout this book: not replacing human judgment, but making human judgment explicit enough that machines can safely do more of the surrounding work.
Design consequence: Pruning keeps the product honest
Features that cannot meet the operating standard should not remain because they demo well. In the context of Incidents, Trust Evidence, and Pruning, the practical danger is not that the team lacks effort; it is that effort is aimed at the wrong scarce resource. The durable AI product operations argument says that the old visible unit of work is no longer the safest unit of management. A team can produce more drafts, more code, more messages, more analysis, or more tickets while becoming less reliable at the point where the business needs a decision. The fix is to move the management surface away from raw output and toward evidence: what was decided, by whom, from which inputs, against which criteria, with what rollback path.
A mature implementation treats this as an operating-system concern rather than a personal-performance concern. The artifact should make the judgment visible: the rubric, acceptance gate, cost line, risk boundary, owner, and expiry date. When those fields are missing, the model's speed hides organizational ambiguity. When they are present, AI acceleration becomes tractable because the team can see which decisions deserve automation, which deserve human review, and which deserve rejection before execution begins.
The useful test is whether a new teammate can replay the decision two weeks later without interviewing the original author. If replay requires folklore, the process is still human-memory-bound. If replay can be done from the artifact, the team has converted judgment into infrastructure. That conversion is the recurring discipline throughout this book: not replacing human judgment, but making human judgment explicit enough that machines can safely do more of the surrounding work.
Managerial implication: A postmortem is not the artifact
The postmortem should cause changes in tests, evals, scopes, dashboards, policies, or ownership. In the context of Incidents, Trust Evidence, and Pruning, the practical danger is not that the team lacks effort; it is that effort is aimed at the wrong scarce resource. The durable AI product operations argument says that the old visible unit of work is no longer the safest unit of management. A team can produce more drafts, more code, more messages, more analysis, or more tickets while becoming less reliable at the point where the business needs a decision. The fix is to move the management surface away from raw output and toward evidence: what was decided, by whom, from which inputs, against which criteria, with what rollback path.
A mature implementation treats this as an operating-system concern rather than a personal-performance concern. The artifact should make the judgment visible: the rubric, acceptance gate, cost line, risk boundary, owner, and expiry date. When those fields are missing, the model's speed hides organizational ambiguity. When they are present, AI acceleration becomes tractable because the team can see which decisions deserve automation, which deserve human review, and which deserve rejection before execution begins.
The useful test is whether a new teammate can replay the decision two weeks later without interviewing the original author. If replay requires folklore, the process is still human-memory-bound. If replay can be done from the artifact, the team has converted judgment into infrastructure. That conversion is the recurring discipline throughout this book: not replacing human judgment, but making human judgment explicit enough that machines can safely do more of the surrounding work.
Production implication: Trust evidence must be operationally true
Customers can sense the difference between proof generated from the system and a slide made for procurement. In the context of Incidents, Trust Evidence, and Pruning, the practical danger is not that the team lacks effort; it is that effort is aimed at the wrong scarce resource. The durable AI product operations argument says that the old visible unit of work is no longer the safest unit of management. A team can produce more drafts, more code, more messages, more analysis, or more tickets while becoming less reliable at the point where the business needs a decision. The fix is to move the management surface away from raw output and toward evidence: what was decided, by whom, from which inputs, against which criteria, with what rollback path.
A mature implementation treats this as an operating-system concern rather than a personal-performance concern. The artifact should make the judgment visible: the rubric, acceptance gate, cost line, risk boundary, owner, and expiry date. When those fields are missing, the model's speed hides organizational ambiguity. When they are present, AI acceleration becomes tractable because the team can see which decisions deserve automation, which deserve human review, and which deserve rejection before execution begins.
The useful test is whether a new teammate can replay the decision two weeks later without interviewing the original author. If replay requires folklore, the process is still human-memory-bound. If replay can be done from the artifact, the team has converted judgment into infrastructure. That conversion is the recurring discipline throughout this book: not replacing human judgment, but making human judgment explicit enough that machines can safely do more of the surrounding work.
Operational note: Pruning keeps the product honest
Features that cannot meet the operating standard should not remain because they demo well. In the context of Incidents, Trust Evidence, and Pruning, the practical danger is not that the team lacks effort; it is that effort is aimed at the wrong scarce resource. The durable AI product operations argument says that the old visible unit of work is no longer the safest unit of management. A team can produce more drafts, more code, more messages, more analysis, or more tickets while becoming less reliable at the point where the business needs a decision. The fix is to move the management surface away from raw output and toward evidence: what was decided, by whom, from which inputs, against which criteria, with what rollback path.
A mature implementation treats this as an operating-system concern rather than a personal-performance concern. The artifact should make the judgment visible: the rubric, acceptance gate, cost line, risk boundary, owner, and expiry date. When those fields are missing, the model's speed hides organizational ambiguity. When they are present, AI acceleration becomes tractable because the team can see which decisions deserve automation, which deserve human review, and which deserve rejection before execution begins.
The useful test is whether a new teammate can replay the decision two weeks later without interviewing the original author. If replay requires folklore, the process is still human-memory-bound. If replay can be done from the artifact, the team has converted judgment into infrastructure. That conversion is the recurring discipline throughout this book: not replacing human judgment, but making human judgment explicit enough that machines can safely do more of the surrounding work.
