Quotas, Caps, Overages, and Enterprise Contracts
The mechanisms that make variable-cost pricing safe: a ladder of guardrails from included quotas to committed usage and outcome share.
Research spine
This chapter is grounded in OpenAI API pricing, Anthropic prompt caching documentation, and Stanford HAI, 2025 AI Index Report.
Pricing software that thinks starts by pricing units of resolved work against variable inference, review, and risk cost.
A CRO I worked with closed the largest deal in his company's history, a seven-figure enterprise contract for their AI agent platform, and the room celebrated. Six months later that same deal was the most-discussed line item in the company's board meeting, because the customer's usage had blown through every assumption in the model, the contract had no overage mechanism, and the account was now running at negative gross margin on a multi-year term they could not reprice. The deal that looked like a triumph was an unhedged short position on the customer's own success.
This chapter is about the mechanisms that would have saved that deal: quotas, caps, overages, and the structure of enterprise contracts for software that does work. These are not pricing models. They are the guardrails that make any variable-cost model survivable, and they assemble into a single tool I call the Margin Guardrail Ladder.
The Margin Guardrail Ladder
When usage drives cost, you need a graduated set of controls between "give it away" and "charge for every unit," because different customers and different stages need different points on that gradient. The ladder, from least to most protective of margin, is:
- Included AI. A baseline amount of AI work bundled into the price at no extra charge. Drives adoption, simplest to sell, most dangerous if unbounded. This is the rung most companies start on and the rung that produced the opening disaster of the book.
- Quota. A defined amount of included work, after which something happens. The quota turns "unlimited included" into "this much included," which is the single most important step away from the margin cliff. A quota with nothing after it is just a hard stop; a quota is the foundation the higher rungs build on.
- Soft cap. When the customer approaches the quota, the product warns them, perhaps degrades gracefully, but does not bill more without consent. The soft cap is the anti-bill-shock mechanism: the customer is never surprised, because they were told before they crossed the line.
- Paid overage. Beyond the quota, additional work is billed per unit at a defined rate. This is what makes heavy users profitable instead of dangerous: they pay for the cost they create. Overage is the rung that fixes the Adoption Penalty in the vendor's favor without an unbounded bill, because the customer chose to cross the line and knew the price.
- Reserved capacity. The customer pre-purchases a block of work at a discount, committing to a volume in exchange for a better unit rate. Predictable for both sides, and it pulls revenue forward. This is where consumption pricing starts to feel like a budget the customer controls rather than a meter they fear.
- Committed usage. A contractual minimum spend over a term, drawn down against actual usage, often with the unused portion expiring or rolling. This is the enterprise standard for consumption businesses, and it solves the vendor's revenue-predictability problem that pure usage creates.
- Custom outcome or risk share. At the top, a bespoke arrangement where pricing ties to outcomes or shares risk and reward, used for the largest, most strategic deals. Powerful, high-touch, and only worth the verification overhead at scale.
You do not use one rung. You compose them. A typical durable model is: a quota of included work, a soft cap that warns before overage, paid overage beyond the quota, and for enterprise a committed-usage floor with reserved-capacity discounts. That stack gives the customer predictability and the vendor margin protection at the same time, which is the whole goal.
Sizing the quota: the move that decides everything
The quota is the most consequential number on the ladder, and teams set it carelessly. Set it too high and you have recreated included-AI-for-free for most customers, who never reach the quota and so never pay for the cost they create. Set it too low and the customer hits overage immediately, feels nickel-and-dimed, and resents you. The quota is where the Adoption Penalty Test gets resolved in practice.
The principle: size the quota so the typical customer's normal usage fits comfortably inside it, and overage only triggers for genuinely heavy use. Concretely, pull your usage distribution and set the quota around the 70th to 80th percentile of usage for that segment. That way roughly three-quarters of customers never see an overage charge and feel they got an honest, predictable deal, while the heavy tail, the customers who actually generate the cost problem, flows into overage and pays for itself. The quota becomes the line between "predictable bundle" and "you are now in the territory where you must pay for what you consume," and that line should sit where the cost curve starts to bend, not before.
Set the quota per segment, not globally. A ten-person team and a thousand-person enterprise have different normal usage, and one global quota will be far too small for one and far too large for the other. Quota-by-segment is how you keep each segment independently profitable, the discipline from the margin chapter.
Caps and the anti-bill-shock contract
The soft cap is the mechanism that prevents the Snowflake and Datadog problem from the usage chapter. The rule is simple and should be inviolable: the customer is never billed for overage they did not knowingly cross into. Operationally that means real-time usage visibility in the product, alerts at 50, 80, and 100 percent of quota, and a moment of explicit consent or at least clear notice before overage charges begin to accrue.
There is a harder version some customers demand: a hard cap, where the product stops doing AI work entirely once a spend ceiling is hit, no overage possible. This is the strongest anti-bill-shock guarantee and the right offer for budget-constrained or risk-averse buyers, but it has a cost you must price for: when the cap hits, the product stops delivering value, which can mean a support ticket goes unanswered or a document goes unprocessed. You and the customer have to agree on what happens at the ceiling, degrade, queue, or stop, and that decision is part of the contract, not an afterthought. The bill-shock checklist later in the book turns this into a list, but the principle is that predictability for the customer is worth engineering for, because a customer who trusts the bill renews and a customer who got ambushed does not.
Enterprise contracts: committed usage done right
Enterprise is where the ladder's upper rungs live, and where the opening disaster of this chapter happens when they are missing. The enterprise contract for an AI product that does work needs terms that a traditional seat-based contract never had to think about. Here is the checklist I run on every enterprise AI deal:
- Committed minimum. A floor spend over the term, so the vendor has revenue predictability and the customer earns a volume discount. This is the rung that replaces the predictability seats used to give the sales team.
- Overage rate, defined in advance. What additional work costs beyond the commitment, agreed at signing, never negotiated mid-term under duress. The opening disaster happened because this term did not exist; the customer blew through the commitment and there was no agreed price for the excess.
- Work-unit definition. Exactly what counts as a billable unit, with the edge cases from the work-unit chapter resolved in writing: partial work, retries, rejected outputs, duplicates. Every undefined edge is a future dispute on a large account.
- True-up cadence. How often actual usage reconciles against the commitment: monthly, quarterly, annually. Annual true-ups hide problems for too long; quarterly is usually right for AI, because usage can move fast.
- Cost-decline sharing. What happens to the price when underlying model costs fall, which they will. Either you hold the price (and capture the margin) or you share the decline (and keep the customer happy); decide it explicitly rather than letting the customer discover your margin expanding and demand a renegotiation. The final chapter argues this is one of the most important and most neglected terms.
- Usage-spike protection. A clause covering what happens if usage explodes far beyond projection, protecting both the customer from bill shock and the vendor from negative-margin overruns. A mutual circuit breaker.
- Margin floor enforcement. Internally, no enterprise discount that pushes a unit below the margin floor from the margin chapter, ever, regardless of deal size. The largest deals are where this discipline matters most, because that is where the field pressure to discount is strongest.
The enterprise contract is where the abstract frameworks become legally binding numbers. The committed minimum is your revenue floor, the overage rate is your margin protection on the heavy tail, the work-unit definition is your dispute insurance, and the cost-decline term is your future-margin policy. Skip any of them and you are writing the kind of contract that becomes a board meeting topic.
Reserved capacity and committed usage: the predictability trade
The deeper logic of the upper ladder rungs is a trade: the customer gives you commitment, you give them a discount and predictability. This is the same trade cloud infrastructure has run for years, reserved instances and committed-use discounts, and it works for AI for the same reason. Committed usage solves pure usage pricing's two failures at once: the customer gets a predictable budget (no bill shock) and a better rate (no adoption penalty), and the vendor gets predictable revenue and pulled-forward cash.
The trade only works if your discount for commitment stays above your margin floor. A committed-usage discount that drops the unit price below your fully-loaded cost is just a slower version of the opening disaster, where you have locked in a negative-margin account for a multi-year term. Model the committed rate against the cost stack at the committed volume, and make sure even the discounted rate clears the floor. The largest commitments are exactly where this gets violated, because the customer's use is highest and the field's incentive to close is strongest, which is why the margin floor has to be a hard rule and not a guideline.
A quota model by segment
Here is a representative quota structure for an AI agent product, showing how the ladder composes across segments. The numbers are illustrative, not prescriptive.
| Segment | Included quota (work units/mo) | Soft cap alerts | Overage rate | Enterprise terms |
|---|---|---|---|---|
| Starter | 500 | 80%, 100% | $0.35/unit | none |
| Growth | 5,000 | 50%, 80%, 100% | $0.30/unit | optional reserved capacity |
| Enterprise | committed (e.g. 100K) | full dashboard + alerts | $0.25/unit | committed minimum, quarterly true-up, cost-decline term, spike protection |
Notice the overage rate falls as the segment grows, which rewards adoption (the heavy customer gets a better marginal rate) while still clearing the margin floor at every tier. Notice the soft-cap alerting gets richer as the deal gets bigger, because the bigger the bill, the more the customer needs visibility. And notice that the enterprise tier carries all the contract terms from the checklist, because that is where an unhedged deal does the most damage.
Practical exercise
Take your largest current customer and stress-test the contract. If their usage doubled next quarter, what happens? Is there a defined overage rate, or does the deal silently go negative-margin like the opening disaster? Is there a soft cap that warns them before they cross a line, or will they be ambushed? Is there a cost-decline term, or will your improving margins become a renegotiation fight? Write down which rungs of the Margin Guardrail Ladder that contract actually has, and which it is missing. The missing rungs are your exposure, and on your largest accounts that exposure is exactly where it hurts most.
Key Takeaways
- The Margin Guardrail Ladder runs from included AI through quota, soft cap, paid overage, reserved capacity, committed usage, to custom outcome or risk share; you compose rungs rather than picking one.
- The quota is the most consequential number; size it near the 70th to 80th percentile of segment usage so most customers fit comfortably inside and only the heavy tail flows to overage.
- Soft caps prevent bill shock: the customer is never billed for overage they did not knowingly cross into, which requires real-time visibility and alerts.
- Enterprise AI contracts need terms seats never did: committed minimum, predefined overage rate, work-unit definition, true-up cadence, cost-decline sharing, spike protection, and margin-floor enforcement.
- Committed usage solves pure usage pricing's failures by trading the customer's commitment for predictability and a discount, but the discounted rate must still clear the margin floor.
- An unhedged enterprise deal with no overage mechanism is a short position on the customer's success; the missing ladder rungs are your exposure, worst on your largest accounts.
