AN Alpesh Nakrani
BlogBooksPraiseAbout Work with me →
Book overview
Chapter 10 / The AI-Native Canon

The Operating Playbook

The head of engineering asked for one document new teams could use before adopting AI coding agents. It could not be philosophical.

An AI-native SDLC operating playbook gives teams default rules for generated change: risk classes, provenance, CI gates, review standards, agent permissions, release policies, and learning loops.

The head of engineering asked for one document new teams could use before adopting AI coding agents. It could not be philosophical. It had to answer: what do we store, what do we check, who approves, what is forbidden, how do we recover, and how do we know the system is improving?

This chapter is that playbook.

The AI-native SDLC becomes real when it is operationalized as a small set of defaults: risk classes, provenance requirements, CI gates, review standards, agent permissions, release policies, and learning loops. The defaults should be opinionated enough to guide teams and flexible enough to adapt to domain risk.

Key Takeaways

  • The playbook should be short enough to use and strict enough to prevent predictable failure.
  • Tool rollout is not transformation; the system around the tool decides whether AI helps.
  • Maturity should be measured by evidence, not by whether a team has installed coding agents.
  • The definition of done must include intent, risk, provenance, behavior evidence, ownership, rollback, and production trace.

Research spine

This chapter uses: DORA, State of AI-assisted Software Development 2025; NIST SP 800-218, Secure Software Development Framework; SLSA: Supply-chain Levels for Software Artifacts; OWASP Top 10 for Large Language Model Applications; Google SRE Book; Forsgren et al., The SPACE of Developer Productivity.

Default rules

First, all consequential AI-assisted changes must have an intent link. Second, medium- and high-risk changes must store provenance. Third, generated changes must pass the same mechanical checks as human-written code plus checks for AI artifacts. Fourth, high-risk changes require behavior evidence and rollback. Fifth, agent permissions must be explicit. Sixth, incidents update the lifecycle.

These rules are not enough for every organization, but they are enough to prevent the most common mistakes.

Adoption path

Start with one team and one workflow. Measure review load, defect rate, cycle time, and developer experience. Add gates only where failure appears or risk requires them. Avoid rolling out every tool to every team before the lifecycle is ready. DORA's system framing is important here: the tool is not the transformation. The working system around the tool is.

Maturity ladder

Level 1 is individual AI assistance with minimal governance. Level 2 is team-level standards for prompts, generated code, and review. Level 3 is CI-integrated provenance, risk routing, and behavior evals. Level 4 is production traceability and incident learning. Level 5 is platformized AI-native delivery with reusable controls across teams.

A company should know where each team sits. Pretending all teams are mature because a tool is installed is the fastest path to disappointment.

Operating table

Maturity levelCapabilityEvidence
1Individual assistanceTool use guidelines
2Team standardsReview templates and risk classes
3Lifecycle gatesCI provenance, evals, policy checks
4Production traceabilityRelease metadata and incident loops
5Platformized deliveryReusable controls, dashboards, governance

Artifact example: a definition of done for AI-native software changes

# AI-Native SDLC Definition of Done

A change is ready to merge when:

- [ ] Intent/spec is linked.
- [ ] Risk class is declared and justified.
- [ ] Provenance is attached if required.
- [ ] Mechanical checks pass.
- [ ] Behavioral evidence matches the risk class.
- [ ] Human owner can explain the change.
- [ ] Rollback path exists for user-facing behavior.
- [ ] Production trace is configured when required.
One-page operating playbook with seven panels pinned beside a software delivery pipeline
The operating playbook turns AI-native delivery into seven maintained controls beside the pipeline: intent, provenance, risk, CI, review, release, and learning.

Checklist

  • Publish default risk classes and gates.
  • Pilot with one workflow before broad rollout.
  • Track system metrics, not only tool usage.
  • Create a maturity ladder by team.
  • Make the playbook short enough to use and strict enough to matter.

Takeaway

The AI-native SDLC is a managed operating system for machine-authored change.

Operational note: Defaults create safety and speed

Teams move faster when they do not reinvent policy for every generated change. In the context of The Operating Playbook, the practical danger is not that the team lacks effort; it is that effort is aimed at the wrong scarce resource. The AI-native software delivery argument says that the old visible unit of work is no longer the safest unit of management. A team can produce more drafts, more code, more messages, more analysis, or more tickets while becoming less reliable at the point where the business needs a decision. The fix is to move the management surface away from raw output and toward evidence: what was decided, by whom, from which inputs, against which criteria, with what rollback path.

A mature implementation treats this as an operating-system concern rather than a personal-performance concern. The artifact should make the judgment visible: the rubric, acceptance gate, cost line, risk boundary, owner, and expiry date. When those fields are missing, the model's speed hides organizational ambiguity. When they are present, AI acceleration becomes tractable because the team can see which decisions deserve automation, which deserve human review, and which deserve rejection before execution begins.

The useful test is whether a new teammate can replay the decision two weeks later without interviewing the original author. If replay requires folklore, the process is still human-memory-bound. If replay can be done from the artifact, the team has converted judgment into infrastructure. That conversion is the recurring discipline throughout this book: not replacing human judgment, but making human judgment explicit enough that machines can safely do more of the surrounding work.

Field expansion: Maturity is uneven by design

Different teams should advance at different speeds based on risk, capability, and evidence. In the context of The Operating Playbook, the practical danger is not that the team lacks effort; it is that effort is aimed at the wrong scarce resource. The AI-native software delivery argument says that the old visible unit of work is no longer the safest unit of management. A team can produce more drafts, more code, more messages, more analysis, or more tickets while becoming less reliable at the point where the business needs a decision. The fix is to move the management surface away from raw output and toward evidence: what was decided, by whom, from which inputs, against which criteria, with what rollback path.

A mature implementation treats this as an operating-system concern rather than a personal-performance concern. The artifact should make the judgment visible: the rubric, acceptance gate, cost line, risk boundary, owner, and expiry date. When those fields are missing, the model's speed hides organizational ambiguity. When they are present, AI acceleration becomes tractable because the team can see which decisions deserve automation, which deserve human review, and which deserve rejection before execution begins.

The useful test is whether a new teammate can replay the decision two weeks later without interviewing the original author. If replay requires folklore, the process is still human-memory-bound. If replay can be done from the artifact, the team has converted judgment into infrastructure. That conversion is the recurring discipline throughout this book: not replacing human judgment, but making human judgment explicit enough that machines can safely do more of the surrounding work.

Design consequence: A playbook must be maintained

AI-native delivery evolves quickly; the operating playbook should change through incidents and learning, not quarterly theater. In the context of The Operating Playbook, the practical danger is not that the team lacks effort; it is that effort is aimed at the wrong scarce resource. The AI-native software delivery argument says that the old visible unit of work is no longer the safest unit of management. A team can produce more drafts, more code, more messages, more analysis, or more tickets while becoming less reliable at the point where the business needs a decision. The fix is to move the management surface away from raw output and toward evidence: what was decided, by whom, from which inputs, against which criteria, with what rollback path.

A mature implementation treats this as an operating-system concern rather than a personal-performance concern. The artifact should make the judgment visible: the rubric, acceptance gate, cost line, risk boundary, owner, and expiry date. When those fields are missing, the model's speed hides organizational ambiguity. When they are present, AI acceleration becomes tractable because the team can see which decisions deserve automation, which deserve human review, and which deserve rejection before execution begins.

The useful test is whether a new teammate can replay the decision two weeks later without interviewing the original author. If replay requires folklore, the process is still human-memory-bound. If replay can be done from the artifact, the team has converted judgment into infrastructure. That conversion is the recurring discipline throughout this book: not replacing human judgment, but making human judgment explicit enough that machines can safely do more of the surrounding work.

Managerial implication: Defaults create safety and speed

Teams move faster when they do not reinvent policy for every generated change. In the context of The Operating Playbook, the practical danger is not that the team lacks effort; it is that effort is aimed at the wrong scarce resource. The AI-native software delivery argument says that the old visible unit of work is no longer the safest unit of management. A team can produce more drafts, more code, more messages, more analysis, or more tickets while becoming less reliable at the point where the business needs a decision. The fix is to move the management surface away from raw output and toward evidence: what was decided, by whom, from which inputs, against which criteria, with what rollback path.

A mature implementation treats this as an operating-system concern rather than a personal-performance concern. The artifact should make the judgment visible: the rubric, acceptance gate, cost line, risk boundary, owner, and expiry date. When those fields are missing, the model's speed hides organizational ambiguity. When they are present, AI acceleration becomes tractable because the team can see which decisions deserve automation, which deserve human review, and which deserve rejection before execution begins.

The useful test is whether a new teammate can replay the decision two weeks later without interviewing the original author. If replay requires folklore, the process is still human-memory-bound. If replay can be done from the artifact, the team has converted judgment into infrastructure. That conversion is the recurring discipline throughout this book: not replacing human judgment, but making human judgment explicit enough that machines can safely do more of the surrounding work.

Production implication: Maturity is uneven by design

Different teams should advance at different speeds based on risk, capability, and evidence. In the context of The Operating Playbook, the practical danger is not that the team lacks effort; it is that effort is aimed at the wrong scarce resource. The AI-native software delivery argument says that the old visible unit of work is no longer the safest unit of management. A team can produce more drafts, more code, more messages, more analysis, or more tickets while becoming less reliable at the point where the business needs a decision. The fix is to move the management surface away from raw output and toward evidence: what was decided, by whom, from which inputs, against which criteria, with what rollback path.

A mature implementation treats this as an operating-system concern rather than a personal-performance concern. The artifact should make the judgment visible: the rubric, acceptance gate, cost line, risk boundary, owner, and expiry date. When those fields are missing, the model's speed hides organizational ambiguity. When they are present, AI acceleration becomes tractable because the team can see which decisions deserve automation, which deserve human review, and which deserve rejection before execution begins.

The useful test is whether a new teammate can replay the decision two weeks later without interviewing the original author. If replay requires folklore, the process is still human-memory-bound. If replay can be done from the artifact, the team has converted judgment into infrastructure. That conversion is the recurring discipline throughout this book: not replacing human judgment, but making human judgment explicit enough that machines can safely do more of the surrounding work.

Operational note: A playbook must be maintained

AI-native delivery evolves quickly; the operating playbook should change through incidents and learning, not quarterly theater. In the context of The Operating Playbook, the practical danger is not that the team lacks effort; it is that effort is aimed at the wrong scarce resource. The AI-native software delivery argument says that the old visible unit of work is no longer the safest unit of management. A team can produce more drafts, more code, more messages, more analysis, or more tickets while becoming less reliable at the point where the business needs a decision. The fix is to move the management surface away from raw output and toward evidence: what was decided, by whom, from which inputs, against which criteria, with what rollback path.

A mature implementation treats this as an operating-system concern rather than a personal-performance concern. The artifact should make the judgment visible: the rubric, acceptance gate, cost line, risk boundary, owner, and expiry date. When those fields are missing, the model's speed hides organizational ambiguity. When they are present, AI acceleration becomes tractable because the team can see which decisions deserve automation, which deserve human review, and which deserve rejection before execution begins.

The useful test is whether a new teammate can replay the decision two weeks later without interviewing the original author. If replay requires folklore, the process is still human-memory-bound. If replay can be done from the artifact, the team has converted judgment into infrastructure. That conversion is the recurring discipline throughout this book: not replacing human judgment, but making human judgment explicit enough that machines can safely do more of the surrounding work.

Share