
2025 / Free online book · The AI-Native Canon
Human in the Loop Is Not a Plan
Designing evaluation that scales with autonomy
Access
Free
Chapters
8
Read time
82 min
Putting a human in the loop feels safe and scales terribly. The reviewer becomes the bottleneck, then a rubber stamp, then a liability. This book is about building evaluation that keeps pace with autonomy: sampling, escalation, and the small set of decisions a person should never delegate.
'A human reviews it' is where most AI systems quietly fail. What real evaluation looks like when the machine outpaces the reviewer.
This edition is free to read onsite. Each chapter has its own URL, so readers can bookmark, share, and return to the exact section they need.
Table of contents
FM Front Matter: Human in the Loop Is Not a Plan Designing evaluation that scales with autonomy 2 min 01 The Review Queue Collapse At 9:10 on Monday morning, the support automation team believed they had solved the human-in-the-loop problem. No refund would be issued without a human review. 9 min 02 Human in the Loop Is an Incomplete Sentence "Human in the loop" is an incomplete sentence. 9 min 03 The LOOP-SAFE Framework This book needs a compact operating framework because "human review" can sprawl into policy, product, operations, UX, compliance, data, and machine learning. The framework is called LOOP-SAFE. 10 min 04 Risk-Tiered Review and the Capacity Math The most important human review design decision is not interface layout. It is risk tiering. 9 min 05 Rubrics, Calibration, and Disagreement A review loop is only as good as the judgment it applies. Judgment does not become reliable merely because a human performed it. 9 min 06 Sampling, Evals, and Continuous Learning If approval loops prevent known high-risk failures, sampling loops detect unknown failures. A system that reviews only the cases it already knows are risky will miss the failures it has not yet learned to name. 9 min 07 Autonomy Boundaries and Evaluation Strength The more autonomy a system has, the stronger its evaluation must be. 9 min 08 The Operating System for Scalable Oversight The final chapter assembles the book into an operating system. A team that wants scalable oversight needs more than a review queue, more than evals, and more than a policy document. 10 min END Conclusion: Design the Loop or Remove the Claim The phrase "human in the loop" should make a product leader nervous until the loop is specified. A vague human loop creates false reassurance. A designed loop creates operational control. 2 min A Appendix A: Source Index Designing evaluation that scales with autonomy 2 min B Appendix B: Glossary Approval loop: A pre action human review process that must approve an AI action before execution. 2 min
