AN Alpesh Nakrani
BlogBooksPraiseAbout Work with me →
Book overview
Chapter 4 / Points of View

The Pilot Is a Contract, Not a Favor

The clause that proves you are honest is the one that tells the buyer when to stop.

Most AI pilots fail before the first model runs, in a meeting that never happens.

The meeting that never happens is the one where the seller and the buyer write down what the pilot is supposed to prove, how they will know, what it costs, where it stops, and who owns it. Instead, the pilot starts on a handshake and a shared optimism. "Let's just get something in front of your team and see." That sentence has killed more AI deals than any model limitation. A pilot without a written definition is not an experiment. It is a hope with a calendar invite, and hope cannot fail or succeed, which means it cannot produce evidence, which means it cannot move a burned buyer one inch.

The reframe that organizes this chapter: a pilot is a contract, not a favor. The buyer is not doing you a kindness by letting you run a trial. They are entering into a defined experiment with you, with terms, and the terms are the product. A burned buyer has done the favor version before. It consumed their team's time, drifted in scope, and ended in ambiguity that the optimistic seller spun as success. The Honest Pilot Contract is the cure for all three.

Why the favor version fails

The favor pilot fails for structural reasons, not bad luck.

It has no defined success metric, so it cannot conclude. When the trial period ends, there is no agreed answer to "did it work," which means the decision defaults to vibes, and vibes favor whoever is most enthusiastic in the room, which is usually the seller. Burned buyers have learned to distrust exactly this, because last time the vibes said yes and production said no.

It has no scope boundary, so it drifts. The buyer's team keeps finding new things to try. The seller, eager to please, keeps saying yes. Three months later the pilot is testing five use cases poorly instead of one well, and none of them produced clean evidence. This is the most common way a technically fine product produces an inconclusive pilot.

It has no cost ceiling, so the cost discovery happens at the worst possible time. The buyer realizes mid-pilot that the real usage cost or the review labor is higher than expected, and now the discovery feels like a betrayal rather than a planned finding. This loads the N scar from the BURNED Diagnostic directly.

It has no owner, so it dies when attention moves. The favor pilot is sponsored by enthusiasm, and enthusiasm is not a role. When the champion gets pulled onto something else, the pilot has no one whose job it is to keep it alive. This is the U scar, unclear ownership, the single most common organizational cause of failure (MIT, "The GenAI Divide," 2025).

And critically, it has no kill criteria, so it cannot end cleanly. A favor pilot that is going badly does not get stopped. It gets extended, because nobody wants to be the one who declares the expensive experiment a failure. It limps until everyone is exhausted and the relationship is soured. Burned buyers have lived this exact death march, and it is why they hesitate before agreeing to "just try something."

The Honest Pilot Contract as a one-page ten-field form with kill criteria emphasized
Ten fields, with kill criteria as the field that proves the seller is honest.

The Honest Pilot Contract

The Honest Pilot Contract is a one-page document with ten fields. It is not a legal contract, though it informs one. It is a shared definition the buyer and seller agree to before any model runs. Each field is there because its absence killed a real pilot somewhere.

The ten fields:

  1. Scope. The single workflow, the single use case, and the explicit list of what is out of scope. One thing, done well. Scope is written here precisely so the buyer's team cannot drift it later without a deliberate decision.
  2. Data. Exactly what data the pilot uses, where it comes from, where it lives during the pilot, and what happens to it after. This field is where the data owner and security sign off, and it is written in language they can vet.
  3. Metric. The one or two numbers that define success, with a threshold agreed in advance. Not "see how it goes." A number, with an operator-relevant definition. "On 500 of your real tickets, the model's draft requires no substantive edit at least X percent of the time."
  4. Cost ceiling. The maximum the pilot will cost the buyer, including their team's time, with usage and review labor counted, not just license. Naming this up front converts the N scar from a future betrayal into a planned parameter.
  5. Integration boundary. Exactly which systems the pilot touches and which it does not. For a pilot, this is usually deliberately minimal: a sandbox, an export, a read-only connection. The boundary protects IT's roadmap and answers their scar.
  6. Security assumptions. The explicit security posture for the pilot, aligned to whatever framework the buyer uses, so security can attack it before it runs rather than after. Reference NIST's Govern, Map, Measure, Manage structure or ISO/IEC 42001 if the buyer works in those terms (NIST AI RMF 1.0).
  7. Fallback. What happens when the model is unsure or wrong. A pilot with no fallback is a pilot that scares the operator. The fallback is usually a human in the loop, and naming it reassures the team that the model is an assistant, not an unsupervised actor.
  8. Production path. The pre-agreed answer to "if this works, what does it take to put it in production." This is the rung-five-and-six conversation from the Proof Ladder, had before the pilot, so success leads somewhere instead of into a new round of uncertainty.
  9. Owner. One named person on the buyer's side who owns the pilot day to day, and one on yours. Not a committee. A name. This single field prevents the most common organizational death.
  10. Kill criteria. The conditions under which the buyer should stop, written by you, the seller.

That last field is the entire honesty test, so it gets its own section.

Kill criteria are the proof you are honest

Every other field in the contract a confident seller will happily fill in. Scope, metric, owner: these make the seller look organized. Kill criteria are different. Kill criteria are the seller writing down, in advance, the conditions under which the buyer should walk away from the seller's own product.

This is the single most disarming move available to an AI seller working a burned buyer, and almost no one does it. When you say, "Here are the three results that would tell us this is not right for you, and if we hit any of them, we stop and you owe us nothing further," you have done something the last vendor would never do. The last vendor had no kill criteria because the last vendor's pilot could not fail; it could only be spun. By writing the failure conditions yourself, you demonstrate that you are optimizing for the right fit rather than for the close. That is precisely the thing a burned buyer cannot get from anyone else, because everyone else is still selling them belief.

Good kill criteria are specific and measurable. "If after 500 real cases the no-edit rate is below the threshold and not improving, we stop." "If integration requires touching systems beyond the agreed boundary, we stop and re-scope." "If the fully loaded cost at projected volume exceeds the ceiling, we stop." Notice that each kill criterion maps to a BURNED failure mode: the metric kill addresses E and B, the integration kill addresses the IT scar and D, the cost kill addresses N. You are building a pilot that fails fast and cleanly on exactly the dimensions that killed the buyer's last attempt.

There is a commercial argument for kill criteria too, beyond honesty. A pilot that can fail cleanly is a pilot a cautious committee can approve, because the downside is bounded and known. The favor pilot, with no exit, is actually harder to get approved through a scarred procurement function precisely because they cannot see where it ends. Kill criteria reduce the buyer's perceived risk, which accelerates the approval. Honesty is not slower here. It is faster.

A filled-in example

Here is the contract for a hypothetical pilot, labeled hypothetical, so the fields are concrete rather than abstract.

FieldEntry
ScopeDraft first-response emails for billing inquiries only. Out of scope: technical support, account changes, anything requiring a refund decision.
Data600 historical billing tickets, anonymized, exported to a pilot environment. Deleted within 30 days of pilot end. No production system access.
MetricOn 200 held-out tickets, agent rates the draft "send with minor or no edits" at least 65 percent of the time.
Cost ceiling40,000 dollars total, including 60 hours of buyer-team time. Usage and review labor estimated and capped.
Integration boundaryRead-only export only. No connection to CRM, billing system, or identity provider during pilot.
Security assumptionsData anonymized before export. Pilot environment access limited to four named individuals. Aligned to buyer's existing data-handling policy; security pre-read attached.
FallbackAgent always reviews and can discard any draft. Model never sends autonomously during pilot.
Production pathIf successful, production requires CRM integration (est. 3 weeks IT effort), security review, and a phased rollout starting with one team. Pre-scoped, not committed.
OwnerBuyer: J. Rivera, billing ops manager. Seller: named solution architect.
Kill criteria(1) No-edit rate below 50 percent after 200 cases and not trending up. (2) Integration found to require systems beyond the agreed boundary. (3) Fully loaded cost at projected volume exceeds 0.40 dollars per ticket. Hit any, we stop.

A burned buyer reading that document relaxes in a way that no demo can produce, because every field corresponds to a place they were previously hurt, and the kill criteria prove you are not hiding the exit.

Scope is a discipline, not a limitation

The hardest field to hold is scope, because both sides are tempted to expand it. The buyer's team gets excited and wants to try more. The seller wants to show more value. Resist. A pilot that proves one thing cleanly is worth more than a pilot that gestures at five things ambiguously, because evidence is what moves the committee and ambiguity is what stranded the last one. The Proof Ladder is a ladder, not a platform. You are climbing one rung. Trying to climb three at once is how people fall.

When the buyer pushes to expand scope mid-pilot, the honest answer is, "That is a great second pilot, and here is how we'd scope it, but expanding this one will blur the evidence we agreed to produce." That sentence protects the buyer from themselves and protects your evidence from dilution. Burned buyers, once they understand why, respect the discipline, because scope drift is one of the things that killed them last time.

When the pilot says no

Sometimes the honest pilot, run correctly, fails. The metric is not met. A kill criterion is hit. The model genuinely cannot do the task at the quality the workflow needs, or the cost does not pencil out at the buyer's volume. This is not a disaster. It is the system working. A pilot designed to be able to fail, that then fails, has done its job: it prevented a bad deployment, protected the buyer's trust, and protected you from a churn and a bad reference.

How you behave at this moment determines everything that comes after. The dishonest seller spins the failure, pushes for an extension, and tries to salvage the close. The honest seller says, "Based on what we agreed, this is not right for you right now, here is specifically why, and here is what would need to change for it to be worth revisiting." That seller does not get this deal. That seller gets the next three, because the buyer now knows something rare: this is a vendor whose pilots tell the truth. We spend the closing chapter on exactly how this counterintuitive outcome compounds.

In the next chapter we go deep on the field that finance attacks hardest and that kills the most pilots quietly: the cost ceiling, and the ROI numbers that have to survive contact with token cost, latency, and human review.

Practical Exercise

Take your current pilot offer and write it as the ten-field Honest Pilot Contract. The test is field ten. If you cannot write three specific, measurable conditions under which the buyer should stop, you do not yet understand your own product well enough to sell it to a burned buyer, and the gaps in your kill criteria are exactly the failure modes that will surface in deployment. Write them before you write the proposal.

Key Takeaways

  • A pilot without a written definition is a hope with a calendar invite; it cannot conclude, so it cannot produce evidence, so it cannot move a burned buyer.
  • The Honest Pilot Contract has ten fields: scope, data, metric, cost ceiling, integration boundary, security assumptions, fallback, production path, owner, and kill criteria.
  • Kill criteria, written by the seller, are the single most disarming honesty move available, and each one should map to a BURNED failure mode.
  • A pilot that can fail cleanly is easier for a scarred committee to approve, so honesty accelerates the deal rather than slowing it.
  • Hold scope as a discipline; one thing proven cleanly beats five things gestured at, because evidence moves committees and ambiguity stranded the last pilot.
Share