Token budgets, model tiering, and committed tiers with price caps cut a global bank's projected GenAI run rate by 38 percent. Here is the program.
A global bank cut its projected GenAI run rate by 38 percent in nine months, not by blocking models but by putting token budgets, model tiering, and contract caps in place before scale hit.
The bank had approved a group wide GenAI rollout: developer assistants, a customer service copilot, and document intelligence across compliance. Finance projected the run rate from pilot consumption and the number landed at roughly 4 times the original business case.
Pilot economics do not scale linearly. Pilots run on list prices, generous context windows, and the largest models by default, because nobody optimizes a pilot. The projection priced that behavior across 60,000 employees.
Cloud FinOps watched infrastructure, not SaaS API consumption. GenAI spend arrived through three doors at once: direct provider contracts, cloud marketplace listings, and embedded copilot SKUs, and no single dashboard saw all three.
The governance model had three layers: a use case registry with token budgets, model tiering rules, and a FinOps showback loop. Nothing was blocked; everything was metered, owned, and reviewed monthly.
The three governance layers and what each contributed
| Layer | Mechanism | Contribution to the 38 percent |
|---|---|---|
| Use case registry | Monthly token budget per approved use case | Stopped unbounded growth |
| Model tiering | Route by task complexity, small models first | Roughly 60 percent of savings |
| FinOps showback | Cost per team, per use case, monthly review | Kept the curve flat after launch |
Budgets were set from 90 days of observed pilot consumption plus 50 percent headroom, then reviewed monthly. Exceeding budget triggered a review, not a cutoff, and reviews mostly produced routing fixes rather than usage reductions.
Classification, extraction, and summarization moved to small models; reasoning and generation stayed on flagship models. Routing rules referenced published price gaps, such as those on the OpenAI API pricing page and Anthropic pricing page, where small model rates run an order of magnitude below flagship rates.
The contracting work ran parallel to governance. The bank consolidated GenAI spend into committed tiers with two providers plus its cloud platforms, and priced the commitment against the consumption curve the governance model made predictable.
Platform routes let the bank draw down existing cloud commitments. Consumption through Azure OpenAI Service and Vertex AI counted toward cloud commit drawdown, which effectively discounted the GenAI spend twice.
Nine months after launch the bank ran 23 approved use cases at 38 percent below the original projected run rate, with adoption above plan. The governance model, not usage suppression, delivered the gap.
The sequence is the transferable asset: meter first, tier second, commit third. Committing spend before governance makes the commit a guess; committing after gives you a defensible curve and a stronger negotiating position.
The standard advice says GenAI spend is too early to govern, that you should let teams experiment freely and clean up the bill later. We disagree. In roughly 20 to 30 GenAI governance engagements Fredrik Filipsson advised between 2024 and 2025, the clean up later estates paid 2 to 4 times more per delivered use case than estates that metered from day one, and none of the governed estates showed slower adoption. The buyer side move is to put token budgets and tiering in place before scale, then sign committed tiers against the predictable curve. Waiting does not buy learning; it buys an uncapped invoice and a weaker negotiating position.
Three cuts of our advisory engagement file frame the size of the opportunity.
Source: Redress Compliance advisory engagement file, 2024 to 2025.
Five moves turn this analysis into a lower invoice on the next renewal.
The bank ran 38 percent below its projected GenAI run rate nine months after launch, with 23 approved use cases live and adoption above plan. The savings came from token budgets, model tiering, and committed contract tiers, not from usage suppression.
Model tiering routes each task to the cheapest model that handles it, keeping flagship models for reasoning and generation. In this program it cut average cost per request by roughly 60 percent, and our 2024 to 2025 engagement file shows 50 to 70 percent across comparable estates.
Larger than finance expects. Discovery in this case found 3 times more GenAI spend than finance tracked, and our engagement file shows 2 to 4 times as the normal range, mostly API keys on corporate cards and inside cloud accounts.
Commit only after governance makes the consumption curve predictable, which took two quarters at this bank. Committed tiers then delivered 15 to 30 percent against pay as you go rates, with annual price caps and model substitution rights protecting the term.
Consumption through Azure OpenAI Service or Vertex AI can draw down existing cloud commitments, which effectively discounts the spend twice. The bank kept multi provider routing to preserve renewal leverage on every path.
The FinOps team that already governs cloud consumption. Tokens behave like any other metered resource: they need budgets, owners, showback, and a monthly review loop, which is existing FinOps muscle applied to a new meter.
The token budget model, tiering rules, and contract caps from 20 plus GenAI governance engagements.
Used across more than five hundred enterprise engagements. Independent. Buyer side. Built for procurement leaders running the next renewal cycle.
Token governance is not about saying no. The bank approved more use cases than it planned and still beat the projection by 38 percent.
500+ enterprise clients. 11 vendor practices. Industry recognized. One conversation can change what you pay for the next three years.
One buyer side briefing a week. Pricing moves, audit signals, and the levers that work. No vendor spin.