When the API is inside your product, token rates are gross margin. This SaaS provider cut OpenAI COGS 30 percent without touching the roadmap.
A B2B SaaS provider embedding OpenAI in its product cut API cost of goods sold by 30 percent through telemetry based commit sizing, model tiering, and a shorter, repriceable term.
The customer is a North American B2B SaaS provider with about one thousand employees whose product embeds OpenAI API calls, making token spend a direct cost of goods sold. Every basis point of token cost flowed straight into gross margin, which made the renewal a board level conversation.
The estate spanned ChatGPT Enterprise seats for staff and high volume API usage inside the product. The API line dwarfed the seat line by roughly eight to one.
Seat overspend wastes budget. API overspend compresses the margin investors price the company on. The discipline applied to cloud unit economics has to apply to OpenAI API pricing the same way.
OpenAI's renewal proposal anchored on an annual commit sized well above measured consumption, justified by the customer's own growth forecast. Growth forecasts are the seller's favorite sizing tool because they are unfalsifiable at signature time.
The engagement rebuilt the commit from telemetry up: ninety days of measured token consumption by model, by endpoint, and by customer tier, projected forward on contracted product growth only. That number became the negotiation floor.
Commit position before and after the telemetry rebuild
| Dimension | Opening proposal | Closed position |
|---|---|---|
| Annual commit basis | Vendor growth forecast | Measured tokens plus contracted growth |
| Model mix | Default flagship model | Tiered routing by task complexity |
| Unit rates | List API rates | Negotiated volume rates at commit tier |
| True up | Annual, pay full gap | Quarterly review with re forecast |
| Term | Multi year fixed commit | One year with growth options |
Roughly 60 percent of the product's API calls were classification and extraction tasks that a smaller model handled within quality thresholds. Routing those calls off the flagship model cut blended cost per request by about a third, validated with A B output scoring before cutover.
The standard advice is to maximize the commit to maximize the discount. We disagree. In the 2024 to 2025 GenAI deals Morten Andersen benchmarked, the discount delta between a right sized commit and an inflated one rarely exceeded a few points, while the unused commit burned real cash. Token prices have also fallen repeatedly, which means a long oversized commit locks yesterday's rates onto tomorrow's volumes. The buyer side move is a shorter, smaller commit with growth options, repriced as the market moves.
Source: Redress Compliance advisory engagement file, 2024 to 2025.
A growth forecast is a sales document. Ninety days of token telemetry is a negotiation position.
The closed agreement cut OpenAI API cost of goods sold by roughly 30 percent against the run rate trajectory, through three stacked levers rather than one headline discount.
The pattern is portable: measure before you commit, tier the model mix, and keep the term short while unit prices are falling. The GenAI vendor practice runs this sequence as a standard engagement, and the Anthropic enterprise licensing guide covers the same logic on the Claude side.
Browse more outcomes in the case study library, or start with the GenAI knowledge hub. For continuous coverage across vendors, see Vendor Shield.
Three stacked levers: the annual commit was rebuilt from measured token telemetry instead of growth forecasts, about 60 percent of calls moved to smaller models within quality thresholds, and unit rates were negotiated at the honest commit tier on a one year term.
Usually no. In our 2024 to 2025 benchmarks the extra discount from an inflated commit was a few points at most, while unused commit was real cash burned and falling token prices made long fixed commits more expensive than they looked.
Routing each call type to the cheapest model that meets its quality threshold. Classification and extraction tasks often run well on smaller models, cutting blended cost per request substantially without product changes.
One year with growth options and quarterly re forecast rights was the closed structure here. While per token prices keep moving down, a short repriceable term captures market drops that a multi year fixed commit locks out.
Measure token consumption by model, endpoint, and customer tier for at least ninety days, project forward on contracted growth only, and use that number as the commit floor in negotiation.
The telemetry based commit model, model mix scoring method, term structure options, and the rate benchmarks from recent closed deals.
Used across more than five hundred enterprise engagements. Independent. Buyer side. Built for procurement leaders running the next renewal cycle.