AN Alpesh Nakrani
BlogBooksPraiseAbout Work with me →
Book overview
Chapter 5 / The AI-Native Canon

Pilots That Produce Evidence, Not Theatre

The pilot had twenty users, glowing quotes, and no baseline. Sales called it successful.

Key Takeaways

  • Pilots That Produce Evidence, Not Theatre is a chapter about AI revenue engineering, not a generic AI adoption note.
  • The operating rule is to sell proved work, measured risk, and margin discipline rather than demo theater.
  • The failure mode to watch is polished output without evidence, owner, cost line, or rollback path.
  • The useful next step is an artifact a future teammate can replay without folklore.

AI revenue work converts when the seller can prove resolved work, cost, risk, and expansion evidence, not just a polished demo.

The pilot had twenty users, glowing quotes, and no baseline. Sales called it successful. Finance asked what changed. Customer success could not tell whether usage represented curiosity or value. The renewal conversation started from vibes because the pilot had been designed as a demo extension rather than an evidence machine.

AI pilots must be built to answer the renewal question before the contract is signed.

A production-grade pilot defines baseline, success metric, sample, workflow boundary, user segment, risk controls, evidence owner, and decision date. It should produce the proof needed for expansion or the evidence needed to stop. A pilot that cannot fail cannot prove anything.

Research spine

This chapter uses: Brynjolfsson, Li, Raymond, Generative AI at Work, NBER Working Paper 31161; Brynjolfsson, Li, Raymond, Generative AI at Work, Quarterly Journal of Economics; NIST AI Risk Management Framework; Bessemer Venture Partners, State of AI 2025.

Baseline first

Without a baseline, improvement is anecdote. For support, baseline might be issues resolved per hour, reopening rate, CSAT, or time to first response. For engineering, it might be cycle time, escaped defects, or review load. For revenue, it might be account research time or conversion from meeting to qualified opportunity. The baseline should be measured before the AI workflow is introduced.

A pilot measurement bridge from baseline through controlled workflow to an evidence package with outcome, quality, cost, risk, and expansion decision cards
A useful pilot is a measurement bridge from baseline to evidence package, not a staged demonstration of possibility.

Pilot scope

Narrow pilots are not timid; they are scientific. Choose a workflow where inputs, outputs, risks, and outcomes can be measured. Avoid pilots that cover everything and prove nothing. AI-native products especially need scope because broad capability creates broad ambiguity.

Evidence package

The pilot output should be an evidence package: baseline, usage, outcome, quality review, failure cases, cost, user feedback, risk notes, and expansion recommendation. That package becomes the sales asset for expansion and the customer-success artifact for adoption.

Operating table

Pilot elementBad versionGood version
GoalSee if users like itReduce reopened tier-1 tickets by 20%
ScopeAll support topicsPassword reset and address change
EvidenceQuotes and screenshotsBaseline, outcomes, QA, cost
DecisionContinue if excitedExpand, revise, or stop by date

Artifact example: a pilot scorecard designed for commercial evidence

pilot_scorecard:
 baseline_period: "2026-04-01..2026-04-30"
 pilot_period: "2026-05-01..2026-05-31"
 workflow: "tier-1 billing-address changes"
 success_metrics:
 resolution_without_reopen_7d: ">= 88%"
 median_handle_time_reduction: ">= 30%"
 customer_complaint_rate: "<= baseline"
 quality_review:
 sample_size_per_week: 100
 reviewer: "customer operations QA"
 cost:
 max_vendor_cost_per_resolved_case: "$0.42"
 decision_date: "2026-06-05"

Checklist

  • Measure baseline before pilot.
  • Choose a narrow workflow with observable outcome.
  • Define success and failure criteria.
  • Capture quality, cost, and risk, not only usage.
  • End with a decision package for renewal or expansion.

Takeaway

A good AI pilot is not a longer demo; it is a controlled evidence machine.

Share