Front Matter: Human in the Loop Is Not a Plan
Designing evaluation that scales with autonomy
Research spine: this chapter stays grounded in NIST AI Risk Management Framework and NIST Secure Software Development Framework, then applies that evidence to the operating judgment in the book.
Key Takeaways
- Designing evaluation that scales with autonomy
- The practical test is whether a team can name the evidence, owner, and failure mode before it changes behavior.
- Read this with Human in the Loop Is Not a Plan and the adjacent chapters when you need the wider Evals and Evaluation frame.
Designing evaluation that scales with autonomy
The AI-Native Canon - Book IV Year: 2025 Author: Alpesh Nakrani
Preface
Most teams reach for "human in the loop" when they do not yet trust their AI system. The phrase sounds responsible. It reassures stakeholders that a person remains involved. It also hides the hardest question: involved how?
This book argues that human review is not a safety plan unless it has capacity, criteria, evidence, authority, feedback, sampling, calibration, and exit conditions. As AI systems become more autonomous, human judgment becomes more important and less scalable. The solution is not to review everything. The solution is to design oversight systems that spend human attention where it matters and convert review into durable evaluation.
Core Thesis
Human review does not scale by adding more humans. It scales by deciding which decisions deserve human judgment, giving reviewers the evidence to judge, sampling the rest, and turning review into evals.
What This Book Is Not
This is not a compliance checklist. It is not a generic responsible-AI essay. It is not a claim that humans should disappear from AI systems. It is a practical field manual for teams building AI products where autonomy, evaluation, review, and accountability must work together.
Visual Style Used Throughout
Every chapter includes a production-ready infographic prompt in the same visual language: white technical page, black ink linework, sparse muted accents, readable engineering labels, diagrams that explain rather than decorate.
