GenAI Cost Governance at a Global Bank: Case Study

A global bank cut its projected GenAI run rate by 38 percent in nine months, not by blocking models but by putting token budgets, model tiering, and contract caps in place before scale hit.

Key takeaways

38 percent off the projected run rate: governance landed before scale, so the bank never paid the uncontrolled price.
Token budgets per use case: every approved use case carries a monthly token budget with an owner, alerts, and a hard review trigger.
Model tiering does the heavy lifting: routing routine work to small models cut average cost per request by roughly 60 percent.
Contract caps beat discounts: committed spend tiers with annual price caps protected the bank when usage tripled.
Shadow AI was the first find: discovery surfaced 3 times more GenAI spend than finance had on the books.
FinOps owns the meter: the same team that governs cloud consumption now governs tokens, with the same showback discipline.

What GenAI cost problem did the bank face?

The bank had approved a group wide GenAI rollout: developer assistants, a customer service copilot, and document intelligence across compliance. Finance projected the run rate from pilot consumption and the number landed at roughly 4 times the original business case.

Pilot economics do not scale linearly. Pilots run on list prices, generous context windows, and the largest models by default, because nobody optimizes a pilot. The projection priced that behavior across 60,000 employees.

No consumption owner: tokens were nobody's budget line, so nobody managed them.
Largest model default: every request hit the flagship model regardless of task complexity.
Shadow usage: teams had API keys on cards, outside procurement and outside security review.

Why did the existing cloud governance not catch this?

Cloud FinOps watched infrastructure, not SaaS API consumption. GenAI spend arrived through three doors at once: direct provider contracts, cloud marketplace listings, and embedded copilot SKUs, and no single dashboard saw all three.

How did the cost governance model actually work?

The governance model had three layers: a use case registry with token budgets, model tiering rules, and a FinOps showback loop. Nothing was blocked; everything was metered, owned, and reviewed monthly.

The three governance layers and what each contributed

Layer	Mechanism	Contribution to the 38 percent
Use case registry	Monthly token budget per approved use case	Stopped unbounded growth
Model tiering	Route by task complexity, small models first	Roughly 60 percent of savings
FinOps showback	Cost per team, per use case, monthly review	Kept the curve flat after launch

How were token budgets set without strangling adoption?

Budgets were set from 90 days of observed pilot consumption plus 50 percent headroom, then reviewed monthly. Exceeding budget triggered a review, not a cutoff, and reviews mostly produced routing fixes rather than usage reductions.

What did model tiering look like in practice?

Classification, extraction, and summarization moved to small models; reasoning and generation stayed on flagship models. Routing rules referenced published price gaps, such as those on the OpenAI API pricing page and Anthropic pricing page, where small model rates run an order of magnitude below flagship rates.

Which contract levers protected the bank at scale?

The contracting work ran parallel to governance. The bank consolidated GenAI spend into committed tiers with two providers plus its cloud platforms, and priced the commitment against the consumption curve the governance model made predictable.

Committed spend tiers: volume discounts of 15 to 30 percent against the pay as you go rates.
Annual price caps: unit price increases capped even as the provider price lists moved.
Model substitution rights: the right to move workloads to newer, cheaper models inside the commit.
No exclusivity: multi provider routing preserved, which kept every renewal competitive.

Why route some spend through the cloud platforms?

Platform routes let the bank draw down existing cloud commitments. Consumption through Azure OpenAI Service and Vertex AI counted toward cloud commit drawdown, which effectively discounted the GenAI spend twice.

What results did the program deliver?

Nine months after launch the bank ran 23 approved use cases at 38 percent below the original projected run rate, with adoption above plan. The governance model, not usage suppression, delivered the gap.

Quarter one: discovery, registry, and budgets; shadow spend folded into contracts.
Quarter two: tiering rules live; average cost per request down roughly 60 percent.
Quarter three: committed tiers signed; price caps and substitution rights locked.

What can other enterprises copy directly?

The sequence is the transferable asset: meter first, tier second, commit third. Committing spend before governance makes the commit a guess; committing after gives you a defensible curve and a stronger negotiating position.

Where the common advice on GenAI cost control is wrong

The standard advice says GenAI spend is too early to govern, that you should let teams experiment freely and clean up the bill later. We disagree. In roughly 20 to 30 GenAI governance engagements Fredrik Filipsson advised between 2024 and 2025, the clean up later estates paid 2 to 4 times more per delivered use case than estates that metered from day one, and none of the governed estates showed slower adoption. The buyer side move is to put token budgets and tiering in place before scale, then sign committed tiers against the predictable curve. Waiting does not buy learning; it buys an uncapped invoice and a weaker negotiating position.

Financial services analyst monitoring AI consumption dashboards on two screens — The bank's token showback dashboard became the negotiation evidence: committed tiers were priced against nine months of governed consumption data.

What the engagement data shows

Three cuts of our advisory engagement file frame the size of the opportunity.

38%

Cut from projected GenAI run rate

~60%

Lower average cost per request via tiering

Shadow AI spend vs what finance tracked

Source: Redress Compliance advisory engagement file, 2024 to 2025.

What to do next

Five moves turn this analysis into a lower invoice on the next renewal.

A sequence you can run this quarter

Run discovery across cards, cloud accounts, and marketplace listings to find all GenAI spend.
Stand up a use case registry with a token budget and a named owner per use case.
Define model tiering rules that route routine tasks to small models first.
Give FinOps the token meter with monthly showback per team and use case.
Build 90 days of governed consumption data before signing any commitment.
Negotiate committed tiers with price caps and model substitution rights.

White Paper · GenAI

Cut Your OpenAI Enterprise Bill: 9 Buyer Tactics

Read it free.

Read the white paper

Need help? Try our AI agents. Ask the GenAI vendor AI agent → Scoped to one vendor and one problem. Runs in your browser.

Frequently asked questions

How much did the bank save on its GenAI rollout?

The bank ran 38 percent below its projected GenAI run rate nine months after launch, with 23 approved use cases live and adoption above plan. The savings came from token budgets, model tiering, and committed contract tiers, not from usage suppression.

What is model tiering and how much does it save?

Model tiering routes each task to the cheapest model that handles it, keeping flagship models for reasoning and generation. In this program it cut average cost per request by roughly 60 percent, and our 2024 to 2025 engagement file shows 50 to 70 percent across comparable estates.

How big is the shadow AI problem in large enterprises?

Larger than finance expects. Discovery in this case found 3 times more GenAI spend than finance tracked, and our engagement file shows 2 to 4 times as the normal range, mostly API keys on corporate cards and inside cloud accounts.

Should GenAI spend be committed early or kept pay as you go?

Commit only after governance makes the consumption curve predictable, which took two quarters at this bank. Committed tiers then delivered 15 to 30 percent against pay as you go rates, with annual price caps and model substitution rights protecting the term.

Why route GenAI consumption through cloud platforms?

Consumption through Azure OpenAI Service or Vertex AI can draw down existing cloud commitments, which effectively discounts the spend twice. The bank kept multi provider routing to preserve renewal leverage on every path.

Who should own GenAI cost governance?

The FinOps team that already governs cloud consumption. Tokens behave like any other metered resource: they need budgets, owners, showback, and a monthly review loop, which is existing FinOps muscle applied to a new meter.

Vendor Advisory

Cloud & Emerging

Programs

Advisory Services

Assessments

Research

Knowledge Hubs

Tool Hubs

GenAI cost governance: how a global bank cut 38 percent.