Product and finance team reviewing usage analytics charts on a laptop in a meeting room
GenAI Practice

OpenAI API COGS down 30 percent. A B2B SaaS case study.

When the API is inside your product, token rates are gross margin. This SaaS provider cut OpenAI COGS 30 percent without touching the roadmap.

Contact Us GenAI Practice
500+Enterprise clients
$2B+Under advisory
Industry Recognized
500+ Enterprise Clients
$2B+ Under Advisory
11 Vendor Practices
100% Buyer Side Independent

A B2B SaaS provider embedding OpenAI in its product cut API cost of goods sold by 30 percent through telemetry based commit sizing, model tiering, and a shorter, repriceable term.

Key takeaways

  • OpenAI API spend embedded in a product is COGS, and it deserves cloud style unit economics discipline.
  • The opening commit proposal was sized on a growth forecast; the closed commit was sized on ninety days of token telemetry.
  • Routing roughly 60 percent of calls to smaller models cut blended token cost by about a third within quality thresholds.
  • The closed structure cut API COGS roughly 30 percent against the run rate trajectory.
  • Short terms with quarterly re forecasts beat long fixed commits while token prices keep falling.
  • Discount deltas between honest and inflated commits were small in our 2024 to 2025 benchmarks; unused commit is pure waste.

Who is the customer and what was at stake?

The customer is a North American B2B SaaS provider with about one thousand employees whose product embeds OpenAI API calls, making token spend a direct cost of goods sold. Every basis point of token cost flowed straight into gross margin, which made the renewal a board level conversation.

The estate spanned ChatGPT Enterprise seats for staff and high volume API usage inside the product. The API line dwarfed the seat line by roughly eight to one.

Why API COGS is different from seat spend

Seat overspend wastes budget. API overspend compresses the margin investors price the company on. The discipline applied to cloud unit economics has to apply to OpenAI API pricing the same way.

The opening proposal

OpenAI's renewal proposal anchored on an annual commit sized well above measured consumption, justified by the customer's own growth forecast. Growth forecasts are the seller's favorite sizing tool because they are unfalsifiable at signature time.

What did the engagement actually change?

The engagement rebuilt the commit from telemetry up: ninety days of measured token consumption by model, by endpoint, and by customer tier, projected forward on contracted product growth only. That number became the negotiation floor.

Commit position before and after the telemetry rebuild

DimensionOpening proposalClosed position
Annual commit basisVendor growth forecastMeasured tokens plus contracted growth
Model mixDefault flagship modelTiered routing by task complexity
Unit ratesList API ratesNegotiated volume rates at commit tier
True upAnnual, pay full gapQuarterly review with re forecast
TermMulti year fixed commitOne year with growth options

The model tiering move

Roughly 60 percent of the product's API calls were classification and extraction tasks that a smaller model handled within quality thresholds. Routing those calls off the flagship model cut blended cost per request by about a third, validated with A B output scoring before cutover.

Where the common advice on OpenAI commits is wrong

The standard advice is to maximize the commit to maximize the discount. We disagree. In the 2024 to 2025 GenAI deals Morten Andersen benchmarked, the discount delta between a right sized commit and an inflated one rarely exceeded a few points, while the unused commit burned real cash. Token prices have also fallen repeatedly, which means a long oversized commit locks yesterday's rates onto tomorrow's volumes. The buyer side move is a shorter, smaller commit with growth options, repriced as the market moves.

Analytics dashboard showing usage metrics on a laptop screen during a cost review meeting
Telemetry beats forecasts. The commit number that survives negotiation is the one backed by ninety days of measured token consumption.
30%
API COGS reduction at closed terms
60%
Calls routed to smaller models
8 to 1
API spend versus seat spend in estate

Source: Redress Compliance advisory engagement file, 2024 to 2025.

A growth forecast is a sales document. Ninety days of token telemetry is a negotiation position.

What was the commercial outcome?

The closed agreement cut OpenAI API cost of goods sold by roughly 30 percent against the run rate trajectory, through three stacked levers rather than one headline discount.

  • Commit right sizing: the annual commit dropped to the telemetry baseline, eliminating paid for but unused capacity.
  • Model tiering: blended token cost fell as routine calls moved to smaller models per the published model lineup.
  • Rate and term structure: negotiated unit rates at the honest commit tier, on a one year term with quarterly re forecasts anchored to the OpenAI business terms.

What transfers to other SaaS providers

The pattern is portable: measure before you commit, tier the model mix, and keep the term short while unit prices are falling. The GenAI vendor practice runs this sequence as a standard engagement, and the Anthropic enterprise licensing guide covers the same logic on the Claude side.

What to do next

  1. Instrument token consumption by model, endpoint, and customer tier for at least ninety days.
  2. Score which call types meet quality thresholds on smaller models.
  3. Rebuild the commit number from telemetry plus contracted growth only.
  4. Price the gross margin impact of token rates before agreeing any multi year term.
  5. Negotiate quarterly re forecast rights instead of annual true ups.
  6. Benchmark the offered rates against current market before signature.
  7. Reassess the model mix every two quarters as pricing moves.

Browse more outcomes in the case study library, or start with the GenAI knowledge hub. For continuous coverage across vendors, see Vendor Shield.

Frequently asked questions

How did the SaaS provider cut OpenAI API costs by 30 percent?

Three stacked levers: the annual commit was rebuilt from measured token telemetry instead of growth forecasts, about 60 percent of calls moved to smaller models within quality thresholds, and unit rates were negotiated at the honest commit tier on a one year term.

Should an enterprise maximize its OpenAI commit for a bigger discount?

Usually no. In our 2024 to 2025 benchmarks the extra discount from an inflated commit was a few points at most, while unused commit was real cash burned and falling token prices made long fixed commits more expensive than they looked.

What is model tiering in OpenAI cost optimization?

Routing each call type to the cheapest model that meets its quality threshold. Classification and extraction tasks often run well on smaller models, cutting blended cost per request substantially without product changes.

How long should an OpenAI enterprise term be?

One year with growth options and quarterly re forecast rights was the closed structure here. While per token prices keep moving down, a short repriceable term captures market drops that a multi year fixed commit locks out.

How do you size an OpenAI API commit correctly?

Measure token consumption by model, endpoint, and customer tier for at least ninety days, project forward on contracted growth only, and use that number as the commit floor in negotiation.

OpenAI Negotiation Guide

The full OpenAI negotiation guide from the GenAI Practice.

The telemetry based commit model, model mix scoring method, term structure options, and the rate benchmarks from recent closed deals.

Used across more than five hundred enterprise engagements. Independent. Buyer side. Built for procurement leaders running the next renewal cycle.

No spam. We will only email you about this download. Privacy.
Run the software spend health check against your GenAI estate in under five minutes.
Open the Tool →