Name: Pricing Software That Thinks
Availability: InStock

Pricing is never finished; you have to change models without breaking trust, test changes safely, and decide what to do when inference gets cheaper or usage explodes.

Research spine

This chapter is grounded in OpenAI API pricing, Anthropic prompt caching documentation, and Stanford HAI, 2025 AI Index Report.

Pricing software that thinks starts by pricing units of resolved work against variable inference, review, and risk cost.

Everything in this book so far assumes you get to set the price once and get it right. You do not. You will misprice the first model, costs will move under you, usage will surprise you, and competitors will reset the market. Pricing for software that does work is not a decision you make; it is a system you operate, and operating it means changing it without breaking the trust you spent the earlier chapters building. This chapter is about the dynamics: migrating customers across models, running pricing experiments safely, and the two forces that will force your hand whether you plan for them or not, falling model costs and exploding usage.

Pricing is a migration problem, not a launch problem

The hardest part of pricing is rarely the initial model. It is the second one, because by the time you know the first was wrong, you have customers on it who built budgets and expectations around it. Migrating them is where most pricing damage happens, and most of it is avoidable.

The core principle is the one from the bill-shock chapter applied to change: a pricing change the customer can predict and understand is tolerable; one they discover as a surprise is a betrayal. That principle yields a small set of rules for any migration:

Grandfather aggressively at first. Existing customers on the old model are the ones with the most trust to lose and the least patience for being repriced. The default should be that existing customers keep their old terms until renewal, and the new model applies to new customers and new business. This buys you proof that the new model works before you ask your installed base to accept it, and it removes the "you changed the deal on me" grievance entirely for the riskiest population.

Migrate at natural boundaries. Renewal is the natural moment to move a customer to a new model, because it is when they expect to revisit terms anyway. Forcing a mid-term migration reads as a broken promise; proposing a new model at renewal reads as normal business. Align migrations to renewal cadence wherever you can.

Show the customer their bill under both models. The single most trust-preserving thing you can do in a migration is show each customer what they would have paid under the old model versus the new one, using their real usage. If the new model costs them more, they see exactly how much and why, and you can have an honest conversation. If it costs them less, you have a selling point. Either way, the customer is not guessing, and guessing is what produces the angry calls.

Give long notice and a transition path. Sixty to ninety days of notice for a material change is a floor, not a courtesy. And where the new model costs some customers more, a transition path, a phased step-up, a temporary discount that decays, a credit for the difference in the first period, turns a cliff into a ramp. Customers forgive a ramp; they churn on a cliff.

Three responses to falling model cost: hold, pass through, or reinvest — When unit cost falls, the headroom can be captured as margin, passed to the customer, or reinvested into more value per unit.

A migration communication template

When you do communicate a change, the structure matters as much as the terms. Here is the template I use, in order:

What is changing, in one sentence, in the customer's unit. Not "we are updating our pricing model" but "starting at your renewal, you will be billed per resolved ticket instead of per seat."
Why, honestly. A real reason the customer can respect: the old model did not reflect the value you get, or it was not sustainable for us to keep improving the product. Do not insult them with "to serve you better."
What it means for you specifically. Their actual numbers under both models, from their real usage. This is the trust-preserving core; everything else is framing.
What we are doing to ease it. The transition path: grandfathering window, phased step-up, or whatever softening applies.
When, with enough notice to plan. The date, far enough out that their finance team can budget for it.
How to ask questions. A real human and a real channel, because a pricing change with no one to talk to reads as a decree.

The template works because it answers, in order, the exact questions a customer asks when they hear "pricing change," and it answers the scariest one, what does this mean for me, with their own numbers rather than a reassurance. A change communicated this way can be a price increase and still not lose the account.

Experiments: test pricing without breaking trust

You should test pricing changes before rolling them out, but pricing experiments are more dangerous than product experiments, because a customer who finds out they were charged differently than a similar customer for an experiment loses trust in your fairness, which is hard to win back. A few rules for safe pricing experimentation:

Test on new customers, not existing ones. New customers have no prior expectation to violate, so trying a new model with them creates no sense of being repriced. Existing customers should change only through the migration process above, not through silent experiments.

Test willingness-to-pay before you test the live bill. Much of what you want to learn, will customers accept per-resolution pricing, what quota feels fair, where the price ceiling is, you can learn through structured research, pricing surveys, and sales conversations before you ever charge anyone differently. The cheapest pricing experiment is a conversation, not a billing change.

Keep experiments segmented and explainable. If you do run live pricing variants, segment them so that customers who might compare notes are on the same model, and so that any difference is explainable by something legitimate (segment, region, plan) rather than looking arbitrary. The fairness research is unforgiving here: perceived arbitrary price discrimination poisons trust faster than a high price does.

Measure margin, not just conversion. A pricing experiment that lifts conversion can still be a loss if it does so by underpricing the work below your margin floor. Every pricing experiment must report margin alongside conversion and retention, or you are optimizing yourself into the opening disaster of the book at a larger scale.

When model costs fall

Now the force that will reshape your pricing whether you plan for it or not: the price of model inference falls over time, sometimes sharply. GPT-4.1 Mini matches GPT-4o quality on many tasks at a fraction of the cost, and each generation tends to push capability up and price down, as the public API pricing history shows. A cost that was twenty cents per unit this year may be eight cents next year on the same workload. This is good news, and it is also a strategic decision you have to make deliberately, because falling costs do not automatically become margin.

Three things you can do when your unit cost falls, and you should choose, not drift:

Hold the price and capture the margin. Keep charging what you charged; your margin expands. Defensible when your price reflects value, not cost, and when competition is not forcing the price down. This is how you turn cost decline into the profitability that funds everything else, but it requires the cost-decline term from the enterprise contract chapter, or sophisticated customers will notice your margin expanding and demand a renegotiation you did not budget for.
Lower the price and pass it through. Cut the price as cost falls, trading margin for competitiveness and customer goodwill. Often the right move in competitive markets where holding price would lose deals, and a powerful retention story: "we lowered your price as our costs came down."
Reinvest the cost decline into more work per unit. Use the headroom to make each unit better, more retrieval, more reasoning, more thorough review, so the unit cost holds steady but the value rises. This is often the best move, because it improves the product and defends the price simultaneously.

The trap is assuming falling costs will rescue a broken margin automatically. They will not, for the reason flagged in the margin chapter: falling per-token prices frequently coincide with deeper agentic workflows that use more tokens per unit, so your unit cost can hold or even rise as the per-token price falls. More capable agents make more calls. Your margin model must be built on today's costs and today's workflow depth, with future cost declines treated as upside you actively decide how to use, not as a problem that solves itself while you watch your unit costs climb.

When usage explodes

The mirror force: usage grows far faster than you modeled. A customer adopts deeply, a use case goes viral inside their org, and the work your product performs for them multiplies. Under a healthy model this is wonderful, because your overage and committed-usage mechanisms mean they pay for the cost they create. Under a broken model, the opening disaster, exploding usage is exactly the fire that consumes your margin.

What to do when usage explodes, depending on which model you are on:

If you have the guardrail ladder in place, exploding usage mostly takes care of itself: the customer crosses quota, flows into overage, and pays for the additional work. Your job is to make sure the soft caps fired so they were not surprised, and to proactively reach out to a customer whose usage is spiking, both to prevent bill shock and to propose a committed-usage deal that gives them a better rate and you predictable revenue. A usage spike is a sales opportunity, not just a risk, if your model is sound.
If you have included-AI-on-a-seat, exploding usage is the emergency, and you cannot fix it mid-contract for existing customers without breaking trust. The honest moves are: stop the bleeding for new customers immediately by introducing quotas and overage, and migrate existing customers to a sustainable model at renewal using the migration process above. Do not try to claw it back invisibly; that is the credit trap at scale, and it costs you the customers it touches.

There is a product dimension too. A usage explosion is often a signal that you should invest in the cost levers from the margin chapter, caching, output management, model routing, because at high volume those levers move real money. A customer doing ten thousand units a month makes a two-cent-per-unit cost reduction worth two hundred dollars a month, every month, which justifies engineering effort that would be pointless at low volume.

The standing pricing review

The meta-point of this chapter is that pricing needs an owner and a cadence, the way reliability or security does. Costs move, usage moves, competitors move, and a price set once and never revisited drifts out of alignment with all three. Run a standing pricing review, quarterly is usually right, that checks the things this book has been about: are the segment margins still healthy, has model cost moved enough to act on, are any customers heading for bill shock, are the guardrails still sized correctly against the current usage distribution, and is anyone being sold below the margin floor. Pricing is not a launch. It is an operating discipline, and the companies that treat it that way are the ones whose AI economics improve over time instead of degrading.

Practical exercise

Schedule the first standing pricing review on the calendar this week, and put three things on the agenda: the three-segment margin table from the margin chapter, the current model cost versus the cost in your pricing assumptions, and a list of any customers whose usage trend points toward bill shock or overage. If model cost has fallen since you set your prices, decide explicitly which of the three responses you are taking, hold, pass through, or reinvest, and write it down. The drift happens when no one decides; the decision is the discipline.

Key Takeaways

Pricing is a migration problem, not a launch problem; most pricing damage happens when moving existing customers to a new model.
Migrate safely by grandfathering aggressively, moving at renewal boundaries, showing each customer their bill under both models, and giving long notice with a transition path.
Test pricing on new customers, learn willingness-to-pay through research before live billing, keep variants explainable, and measure margin alongside conversion.
When costs fall, deliberately choose to hold price (capture margin), pass it through (competitiveness), or reinvest into more value per unit; do not assume falling costs rescue a broken margin, because deeper workflows can raise unit cost even as token prices fall.
When usage explodes, a sound guardrail model handles it through overage and is a sales opportunity; a broken included-AI model requires stopping the bleeding for new customers and migrating existing ones at renewal.
Pricing needs an owner and a quarterly review checking segment margins, cost movement, bill-shock risk, guardrail sizing, and margin-floor compliance; it is an operating discipline, not a one-time event.

Migration, Experiments, and What Happens When Costs Fall