The Reversibility Ladder
Match the size of your commitment to the reversibility of the decision, so that being wrong is cheap and being right is still available.
Read this alongside the First Principles AI book, the AI-Native thesis, and the full book library when you want the surrounding argument. Jeff Bezos made a distinction in a shareholder letter that I have stolen and used more times than any other single idea in this book, so I will credit it plainly. He divided decisions into two types. Type 1 decisions are one-way doors: hard or impossible to reverse, so you walk through them slowly and deliberately. Type 2 decisions are two-way doors: if you do not like what you find, you can walk back out, so you should make them fast and let people make them without elaborate process. His warning was that organizations tend to apply heavyweight, Type 1 process to lightweight, Type 2 decisions, which slows everything and breeds a culture of caution (Amazon 2015 letter to shareholders, Bezos on one-way and two-way doors).
The hype cycle weaponizes exactly this confusion, in both directions. It pushes you to treat genuinely reversible experiments as if they were momentous, so you over-deliberate and miss cheap learning. And it pushes you to treat genuinely irreversible commitments as if they were casual, so you re-platform on the strength of a demo and discover the one-way door only after it has closed behind you. The defense is a single disciplined idea: match the size of your commitment to the reversibility of the decision. The Reversibility Ladder is how I make that idea operational for AI adoption specifically.
The ladder
The ladder is a sequence of seven rungs, from most reversible to least, each representing a level of commitment to an AI capability. The point is to know which rung you are proposing to stand on, because the rung dictates how much evidence you need before you step onto it.
-
Demo. You watch the vendor's demo or run their hosted example. Fully reversible: you close the tab. Evidence required: none, and you should believe almost nothing.
-
Sandbox. You run the capability yourself, in an isolated environment, on test inputs you control. Reversible: you delete the sandbox. Evidence required: minimal, this is for learning the shape of the thing.
-
Internal tool. You give the capability to your own team for non-critical internal work. Reversible: you turn it off and the team goes back to the old way. Evidence required: it is good enough that your own people tolerate it.
-
Shadow workflow. The capability runs alongside a real production workflow, observing and producing outputs that are logged but not acted on. Reversible: you stop the shadow, nothing downstream changes. Evidence required: it behaves on live data, measured against the real workflow's outcomes.
-
Opt-in production. Real users or real cases flow through the capability, but only those who opted in or a limited segment, with a human in the loop and an easy fallback. Partly reversible: you can roll back the segment, but some real outcomes have already happened. Evidence required: it held up in shadow, and the failure modes are detectable and recoverable.
-
Core workflow. The capability becomes the default path for a real, important workflow, with humans monitoring rather than mediating every case. Hard to reverse: the old path may have atrophied, the team has reorganized around it, dependencies have grown. Evidence required: sustained performance in opt-in production across cost, latency, and reliability, over enough time to trust the maintenance tail.
-
Regulated or irreversible decision. The capability drives a decision with legal, financial, safety, or reputational consequences that cannot be cleanly undone: a decision affecting a customer's money, rights, health, or a public commitment you cannot walk back. Effectively a one-way door. Evidence required: the highest bar, including governance, auditability, and the ability to stand behind every decision to whoever holds you accountable.
The discipline of the ladder is twofold. You climb one rung at a time, gathering the evidence each rung requires before stepping to the next, and you never let a demo, rung 1, justify a step onto rung 5, 6, or 7. The hype cycle's favorite move is exactly that illegal jump: the viral demo that someone wants to put straight into a core or regulated workflow. The ladder names the rungs you skipped and the evidence you do not have for them.
Reversibility is a property you can engineer
Here is a subtlety that took me years to internalize: reversibility is not fixed by the technology. It is partly a property of how you set the decision up, and you can often buy reversibility cheaply if you design for it before committing.
Consider a vendor commitment. Signing a three-year exclusive contract with deep integration is high on the ladder, hard to reverse. The same capability adopted through an abstraction layer that lets you swap providers, on a one-year contract, keeping your own copy of the data and the prompts, is much lower on the ladder, far more reversible, for a small amount of upfront design work. The capability is identical. The reversibility differs by an order of magnitude, and the difference is entirely in how you structured the commitment.
So before any significant AI commitment, I run a short reversibility design pass: what would it cost to undo this, and can I cheaply make it cost less. Keeping your data and your evaluation sets under your own control rather than the vendor's. Putting an abstraction between your code and the specific model so the model is swappable. Choosing shorter contract terms even at slightly higher unit cost, because the optionality is worth more than the discount during a period of rapid change. Preserving the old workflow in a dormant but recoverable state rather than deleting it the moment the new one ships. Each of these moves a decision down the ladder, which means you need less evidence to make it, which means you can move faster, not slower. Engineering reversibility is how you get speed and safety at once.
Pilots as option value
The ladder explains why a portfolio of cheap, reversible pilots beats a single big bet, and the reason is the logic of options. A pilot on a low rung is the purchase of an option: for a small, bounded cost, you buy the right but not the obligation to scale up if the evidence is good. Most options expire worthless, and that is fine, because each one was cheap and a few pay off enough to cover all the duds. This is venture portfolio math applied to internal experiments, and it inverts the intuition that failed pilots are waste. Failed cheap pilots are the cost of the few that succeed, and an org with zero failed pilots is not disciplined, it is not buying enough options.
I track the portfolio as a simple matrix of option value against execution cost, and the position on the matrix tells me how to treat each pilot.
| Low execution cost | High execution cost | |
|---|---|---|
| High option value | Run now, cheap upside, do many | Run carefully, this is your real bet, evidence-gate it hard |
| Low option value | Run only if nearly free or for learning | Avoid, expensive and unlikely to matter |
The top-left quadrant is where a hype budget's frontier slots should mostly live: cheap experiments with real upside, run in volume, most expected to fail. The top-right is the one or two consequential bets that have earned their way up the ladder and now justify serious execution and a high evidence bar. The bottom-right is the trap the hype cycle pushes you toward: expensive chases of things unlikely to matter to you, dressed up as urgent. The matrix keeps the cheap-upside experiments flowing and the expensive low-value chases out.
The pilot portfolio tracker
The portfolio only delivers option value if you actually let the options expire when they should, which means each pilot needs a defined rung, a defined evidence target for promotion, and a defined kill condition set before you start. Here is the tracker I keep, and the columns are deliberately about gates and conditions rather than vibes.
| Pilot | Current rung | Evidence target to promote | Kill condition | Slot used | Revisit |
|---|---|---|---|---|---|
| Support draft replies | 4 shadow | clean-resolve >= 45% on replay, holds in shadow | < 35% after 4 weeks | adjacent-1 | weekly |
| Doc extraction | 5 opt-in | < 1% silent error, cost per doc under target | silent error > 2% | core-1 | weekly |
| Coding assistant | 3 internal | team voluntarily keeps using after 30 days | usage drops to near zero | core-2 | monthly |
| Frontier agent eval | 2 sandbox | end-to-end > 70% on our 5-step replay | < 50%, or no path to improve | frontier-1 | monthly |
The two non-negotiable columns are the evidence target and the kill condition, both set before the pilot starts, because that is the moment when judgment is unclouded by sunk cost. A pilot without a written kill condition is not a pilot, it is the beginning of a chase that escalation of commitment will keep alive past its usefulness. When a pilot hits its kill condition, you kill it, and you treat the kill as the process working, the same blameless stop I argued for in the churn chapter. The option expired worthless, as most options do, and that is a clean outcome, not a failure.
Asymmetry: where to spend the deliberation
The ladder also tells you where to spend your scarce deliberation, and the answer is the asymmetry of the rung. On the low rungs, the cost of a wrong "yes" is small and recoverable, so the right bias is action: try it, learn, you can always step back. Over-deliberating a sandbox experiment is pure waste, the Type 2 door treated as Type 1. On the high rungs, especially 6 and 7, the cost of a wrong "yes" is large and possibly permanent, so the right bias is evidence and patience: the door only opens one way, so be sure before you walk through.
This asymmetry resolves the apparent contradiction at the heart of this book, the tension between not churning and not being left behind. You move fast and loose on the reversible rungs, accumulating cheap learning at high volume, and slow and rigorous on the irreversible ones. The hype cycle wants you to do the opposite: slow and anxious on the cheap experiments, where it costs you learning, and fast and casual on the expensive commitments, where it costs you the company. Get the asymmetry right and you are simultaneously faster than the cautious org and safer than the reckless one, which is the whole game.
Summary
Match the size of your commitment to the reversibility of the decision. The Reversibility Ladder names seven rungs from demo to regulated decision, each requiring more evidence than the last, and forbids the hype cycle's favorite illegal jump from a demo straight into a core or irreversible workflow. Reversibility is partly engineerable: abstraction layers, data ownership, shorter terms, and preserved fallbacks all move a decision down the ladder cheaply, buying speed and safety at once. Treat pilots as options, run cheap high-upside ones in volume, expect most to expire worthless, and gate each with an evidence target and a kill condition set before it starts. Spend deliberation according to the asymmetry of the rung: bias to action where wrong is cheap, bias to evidence where wrong is permanent.
Key Takeaways
- Bezos's distinction holds: treat reversible Type 2 decisions fast and irreversible Type 1 decisions slow. The hype cycle confuses the two in both directions.
- The Reversibility Ladder runs from demo, sandbox, internal tool, and shadow workflow up through opt-in production, core workflow, to regulated decision, each rung requiring more evidence.
- Climb one rung at a time and never let a demo justify a jump to a core or irreversible rung. The illegal jump is the hype cycle's signature move.
- Reversibility is engineerable. Abstraction layers, data ownership, shorter contracts, and preserved fallbacks move a decision down the ladder cheaply, buying speed and safety together.
- Pilots are options. Run cheap high-upside ones in volume, expect most to expire worthless, and judge the portfolio by the few payoffs, not the failure count. Zero failed pilots means too few options bought.
- Every pilot needs an evidence target for promotion and a kill condition, both set before it starts, while judgment is free of sunk cost. A pilot without a kill condition is a chase waiting to happen.
- Spend deliberation by the asymmetry of the rung: bias to action where wrong is cheap and recoverable, bias to evidence where wrong is permanent. Getting the asymmetry right makes you faster than the cautious and safer than the reckless.
