AN Alpesh Nakrani
BlogBooksPraiseAbout Work with me →
Book overview
Chapter 7 / The AI-Native Canon

Incidents, Trust Evidence, and Pruning

The company fixed the incident and kept the feature. Then the same class of failure happened in a different workflow.

The company fixed the incident and kept the feature. Then the same class of failure happened in a different workflow. The postmortem had been written, but the system had not changed. A durable AI organization does not only recover from incidents. It extracts trust evidence and prunes weak surfaces.

Shipping includes stopping.

This chapter connects incident response, customer trust, and product pruning. AI incidents should update evals, scope, runbooks, data contracts, prompts, permissions, and rollout rules. Trust evidence should be collected continuously. Features that cannot be hardened, measured, owned, or justified should be removed or narrowed.

Research spine

This chapter uses: Google SRE Book; NIST AI Risk Management Framework; OWASP Top 10 for Large Language Model Applications; Mitchell et al., Model Cards for Model Reporting; March, Exploration and Exploitation in Organizational Learning.

AI incident reviews

An AI incident review should ask more than what broke. It should ask which assumption failed: scope, data, model, retrieval, prompt, permission, eval, UX, cost, latency, or ownership. It should identify which artifact changes: eval case, prompt version, spec, data contract, runbook, permission rule, dashboard, customer communication, or rollout policy.

Trust evidence packs

Trust evidence is the material a customer, auditor, executive, or support leader needs to believe the system is managed. It can include model cards or system cards, eval summaries, data source lists, permission architecture, incident history, monitoring snapshots, human review process, retention policy, and rollback evidence. The pack should be truthful, not marketing varnish.

Pruning as discipline

AI makes it cheap to add product surfaces. That makes pruning essential. Prune workflows with low usage, high cost, poor quality, unclear owner, weak evidence, repeated incidents, or strategic distraction. Pruning is not failure; it is how durable systems protect focus and trust.

Operating table

Incident findingArtifact updatePruning question
Out-of-scope answerScope/refusal ruleShould this topic be removed?
Stale sourceData freshness contractShould source be excluded until reliable?
Prompt injectionSecurity test and permission gateShould tool action be narrowed?
High costRouting/caching/budget ruleShould workflow move async or premium?
Repeated low qualityEval and model/prompt changeShould feature be retired?

Artifact example: an incident-to-learning artifact

ai_incident_to_learning:
 incident_id: "AI-INC-2026-017"
 failure_class: "out_of_scope_answer"
 customer_impact: "one incorrect policy recommendation"
 root_assumption_failed: "scope boundary did not include contractor policy exclusion"
 required_updates:
 eval_cases:
 - "contractor_policy_question"
 scope_contract:
 - "exclude contractor HR topics"
 refusal_copy:
 - "route contractors to HR operations"
 rollout_policy:
 - "pause contractor cohort"
 prune_review_required: true
Incident loop from incident to assumption failed, artifact update, trust evidence, and prune, narrow, or continue decision
Incidents should update artifacts, create trust evidence, and force a prune, narrow, or continue decision.

Checklist

  • Every AI incident should update at least one artifact.
  • Maintain trust evidence continuously.
  • Review pruning candidates quarterly.
  • Do not let impressive but unowned features survive.
  • Treat narrowing as a valid product improvement.

Takeaway

Durable AI products earn trust by changing after incidents and pruning what cannot be responsibly operated.

Operational note: A postmortem is not the artifact

The postmortem should cause changes in tests, evals, scopes, dashboards, policies, or ownership. In the context of Incidents, Trust Evidence, and Pruning, the practical danger is not that the team lacks effort; it is that effort is aimed at the wrong scarce resource. The durable AI product operations argument says that the old visible unit of work is no longer the safest unit of management. A team can produce more drafts, more code, more messages, more analysis, or more tickets while becoming less reliable at the point where the business needs a decision. The fix is to move the management surface away from raw output and toward evidence: what was decided, by whom, from which inputs, against which criteria, with what rollback path.

A mature implementation treats this as an operating-system concern rather than a personal-performance concern. The artifact should make the judgment visible: the rubric, acceptance gate, cost line, risk boundary, owner, and expiry date. When those fields are missing, the model's speed hides organizational ambiguity. When they are present, AI acceleration becomes tractable because the team can see which decisions deserve automation, which deserve human review, and which deserve rejection before execution begins.

The useful test is whether a new teammate can replay the decision two weeks later without interviewing the original author. If replay requires folklore, the process is still human-memory-bound. If replay can be done from the artifact, the team has converted judgment into infrastructure. That conversion is the recurring discipline throughout this book: not replacing human judgment, but making human judgment explicit enough that machines can safely do more of the surrounding work.

Field expansion: Trust evidence must be operationally true

Customers can sense the difference between proof generated from the system and a slide made for procurement. In the context of Incidents, Trust Evidence, and Pruning, the practical danger is not that the team lacks effort; it is that effort is aimed at the wrong scarce resource. The durable AI product operations argument says that the old visible unit of work is no longer the safest unit of management. A team can produce more drafts, more code, more messages, more analysis, or more tickets while becoming less reliable at the point where the business needs a decision. The fix is to move the management surface away from raw output and toward evidence: what was decided, by whom, from which inputs, against which criteria, with what rollback path.

A mature implementation treats this as an operating-system concern rather than a personal-performance concern. The artifact should make the judgment visible: the rubric, acceptance gate, cost line, risk boundary, owner, and expiry date. When those fields are missing, the model's speed hides organizational ambiguity. When they are present, AI acceleration becomes tractable because the team can see which decisions deserve automation, which deserve human review, and which deserve rejection before execution begins.

The useful test is whether a new teammate can replay the decision two weeks later without interviewing the original author. If replay requires folklore, the process is still human-memory-bound. If replay can be done from the artifact, the team has converted judgment into infrastructure. That conversion is the recurring discipline throughout this book: not replacing human judgment, but making human judgment explicit enough that machines can safely do more of the surrounding work.

Design consequence: Pruning keeps the product honest

Features that cannot meet the operating standard should not remain because they demo well. In the context of Incidents, Trust Evidence, and Pruning, the practical danger is not that the team lacks effort; it is that effort is aimed at the wrong scarce resource. The durable AI product operations argument says that the old visible unit of work is no longer the safest unit of management. A team can produce more drafts, more code, more messages, more analysis, or more tickets while becoming less reliable at the point where the business needs a decision. The fix is to move the management surface away from raw output and toward evidence: what was decided, by whom, from which inputs, against which criteria, with what rollback path.

A mature implementation treats this as an operating-system concern rather than a personal-performance concern. The artifact should make the judgment visible: the rubric, acceptance gate, cost line, risk boundary, owner, and expiry date. When those fields are missing, the model's speed hides organizational ambiguity. When they are present, AI acceleration becomes tractable because the team can see which decisions deserve automation, which deserve human review, and which deserve rejection before execution begins.

The useful test is whether a new teammate can replay the decision two weeks later without interviewing the original author. If replay requires folklore, the process is still human-memory-bound. If replay can be done from the artifact, the team has converted judgment into infrastructure. That conversion is the recurring discipline throughout this book: not replacing human judgment, but making human judgment explicit enough that machines can safely do more of the surrounding work.

Managerial implication: A postmortem is not the artifact

The postmortem should cause changes in tests, evals, scopes, dashboards, policies, or ownership. In the context of Incidents, Trust Evidence, and Pruning, the practical danger is not that the team lacks effort; it is that effort is aimed at the wrong scarce resource. The durable AI product operations argument says that the old visible unit of work is no longer the safest unit of management. A team can produce more drafts, more code, more messages, more analysis, or more tickets while becoming less reliable at the point where the business needs a decision. The fix is to move the management surface away from raw output and toward evidence: what was decided, by whom, from which inputs, against which criteria, with what rollback path.

A mature implementation treats this as an operating-system concern rather than a personal-performance concern. The artifact should make the judgment visible: the rubric, acceptance gate, cost line, risk boundary, owner, and expiry date. When those fields are missing, the model's speed hides organizational ambiguity. When they are present, AI acceleration becomes tractable because the team can see which decisions deserve automation, which deserve human review, and which deserve rejection before execution begins.

The useful test is whether a new teammate can replay the decision two weeks later without interviewing the original author. If replay requires folklore, the process is still human-memory-bound. If replay can be done from the artifact, the team has converted judgment into infrastructure. That conversion is the recurring discipline throughout this book: not replacing human judgment, but making human judgment explicit enough that machines can safely do more of the surrounding work.

Production implication: Trust evidence must be operationally true

Customers can sense the difference between proof generated from the system and a slide made for procurement. In the context of Incidents, Trust Evidence, and Pruning, the practical danger is not that the team lacks effort; it is that effort is aimed at the wrong scarce resource. The durable AI product operations argument says that the old visible unit of work is no longer the safest unit of management. A team can produce more drafts, more code, more messages, more analysis, or more tickets while becoming less reliable at the point where the business needs a decision. The fix is to move the management surface away from raw output and toward evidence: what was decided, by whom, from which inputs, against which criteria, with what rollback path.

A mature implementation treats this as an operating-system concern rather than a personal-performance concern. The artifact should make the judgment visible: the rubric, acceptance gate, cost line, risk boundary, owner, and expiry date. When those fields are missing, the model's speed hides organizational ambiguity. When they are present, AI acceleration becomes tractable because the team can see which decisions deserve automation, which deserve human review, and which deserve rejection before execution begins.

The useful test is whether a new teammate can replay the decision two weeks later without interviewing the original author. If replay requires folklore, the process is still human-memory-bound. If replay can be done from the artifact, the team has converted judgment into infrastructure. That conversion is the recurring discipline throughout this book: not replacing human judgment, but making human judgment explicit enough that machines can safely do more of the surrounding work.

Operational note: Pruning keeps the product honest

Features that cannot meet the operating standard should not remain because they demo well. In the context of Incidents, Trust Evidence, and Pruning, the practical danger is not that the team lacks effort; it is that effort is aimed at the wrong scarce resource. The durable AI product operations argument says that the old visible unit of work is no longer the safest unit of management. A team can produce more drafts, more code, more messages, more analysis, or more tickets while becoming less reliable at the point where the business needs a decision. The fix is to move the management surface away from raw output and toward evidence: what was decided, by whom, from which inputs, against which criteria, with what rollback path.

A mature implementation treats this as an operating-system concern rather than a personal-performance concern. The artifact should make the judgment visible: the rubric, acceptance gate, cost line, risk boundary, owner, and expiry date. When those fields are missing, the model's speed hides organizational ambiguity. When they are present, AI acceleration becomes tractable because the team can see which decisions deserve automation, which deserve human review, and which deserve rejection before execution begins.

The useful test is whether a new teammate can replay the decision two weeks later without interviewing the original author. If replay requires folklore, the process is still human-memory-bound. If replay can be done from the artifact, the team has converted judgment into infrastructure. That conversion is the recurring discipline throughout this book: not replacing human judgment, but making human judgment explicit enough that machines can safely do more of the surrounding work.

Share