The AWS Bedrock Licensing and Negotiation Playbook
One model unit of provisioned throughput runs near $28,908 a month on a 1 month commit, billed used or not. In 2026 the EDP layer, not the rate card, decides what Bedrock costs you.
Prepared by Redress Compliance · June 2026 · Representative enterprise inference estate (benchmark scenario, not a quote)
Executive Summary
Bedrock looks like a metered utility. It negotiates like a commitment product. The on demand rate card is fixed at $3 per million input tokens and $15 per million output for Claude Sonnet class models, and the real price is set one layer up, where Bedrock consumption rolls into your AWS Enterprise Discount Program commit.
That rollup cuts both ways. Bedrock spend counts in full toward your commit and inherits your EDP discount, which across our engagement file lands between 7 and 28 percent depending on commit size. But AI consumption forecasts are also the instrument AWS uses to sell a bigger, longer commit than your workloads justify.
The second trap is fixed capacity. Provisioned throughput bills hourly for the full term, used or not; one model unit on a 1 month commit runs near $28,908 a month. In our file, estates below 70 percent sustained utilization paid 20 to 45 percent more per effective token than on demand.
This paper gives you the negotiation cycle framework, the verified consumption baseline, the five contract clauses that decide whether the commit protects the budget, the discount benchmarks, the counter moves to AWS standard tactics, and the BATNA and side letter language we use. Run it before your next commitment event, not after.
The Negotiation Cycle: Take Back the Calendar
AWS controls the calendar, the pricing reference points, and the forecast in a Bedrock commercial cycle by default. The buyer side framework flips each one: you set the timeline by starting twelve weeks out, you set the reference points with verified consumption data and competitive quotes, and you own the forecast.
The cycle below is the sequence we run with clients, and the order matters. A baseline built after AWS presents its proposal is a rebuttal; built before, it is an anchor.
Baseline and entitlement audit
Pull twelve months of Bedrock consumption by model, mode, and account. Separate on demand, batch, caching, and provisioned spend, then quantify idle provisioned hours and expiring commitments. This is the verified baseline in section 3.
Market test and BATNA build
Price the same workload on Azure OpenAI, Google Vertex AI, and the direct model APIs. Document switching cost honestly, then brief the account team that alternatives are being priced. Leverage starts the day AWS knows.
Term sheet and close
Negotiate the EDP layer, not the rate card. Land the five clauses in section 7, the side letter terms in section 10, and a commit sized to verified consumption plus measured growth, not to the account team's AI forecast.
Token Pricing and the Provisioned Throughput Curve
Bedrock prices inference five ways: on demand per token, batch at half the on demand rate, prompt caching on repeated input, provisioned throughput per model unit hour, and customization with its own meters. Current list rates sit on the AWS Bedrock pricing page; Anthropic models anchor the premium tier while Amazon's Nova family fills the budget tier.
Most enterprises never engineer the gap between these modes. The worked estate below runs 2,400 million input and 600 million output tokens a month on a Claude Sonnet class model. Routing alone moves the bill by almost a third.
| Operating model | Basis | Monthly cost |
|---|---|---|
| On demand list | 2,400M input tokens at $3.00 per million plus 600M output tokens at $15.00 per million ($7,200 + $9,000) | $16,200 |
| Prompt caching added | One third of input tokens (800M) served as cached reads at a 90 percent discount, saving $2,160 | $14,040 |
| Caching plus batch routing | 40 percent of remaining volume routed to batch inference at 50 percent off | $11,232 |
Monthly inference cost for the worked estate under three operating models (benchmark scenario, not a quote).
Provisioned throughput is a different animal: you buy model units that bill by the hour for the full term, whether traffic arrives or not. Commitments run no commit, 1 month, or 6 months, and only the 6 month term carries a deep rate cut. The representative model unit rates below show the shape of the curve.
| Commitment term | Hourly rate per model unit | Monthly cost (730 hours) |
|---|---|---|
| No commit | $44.00 | $32,120 |
| 1 month commit | $39.60 | $28,908 |
| 6 month commit | $22.00 | $16,060 |
Monthly cost of one provisioned throughput model unit by commitment term (benchmark scenario, not a quote).
Where the common advice on provisioned throughput is wrong. The standard reseller and FinOps pitch is to move production inference onto provisioned throughput for the discount. We disagree for most estates: below 70 percent sustained utilization the hourly meter erases the rate advantage, and in our file those estates paid 20 to 45 percent more per effective token.
Provisioned throughput also freezes you onto one model version, which strips the switching leverage section 4 describes. The buyer side move: exhaust caching and batch routing first, and buy provisioned capacity only against measured, sustained floors of traffic.
The Verified Consumption Baseline
Bedrock has no license entitlements in the classic sense. The asset AWS scrutinizes is your forecast, so the baseline that survives scrutiny is built from metered fact, not estimates. Three properties make a baseline defensible in the room.
- Model level granularity: twelve months of invocation data by model ID, mode, and account, from Cost Explorer and CloudWatch, with cached reads and batch jobs separated from interactive traffic.
- Unit economics: cost per thousand requests and per business transaction, so growth debates happen in business terms, not token mysticism.
- Idle capacity accounting: every provisioned model unit hour with no traffic, priced at the hourly rate and listed as recoverable spend.
When AWS presents a forecast assuming triple digit AI growth, the baseline is your counter. Growth is funded from documented waste before it is funded from new commit.
Model Selection, Version Lock, and Switching Cost
Bedrock's catalog breadth is real leverage, but only if your architecture preserves it. Three mechanics decide whether you can actually move spend between Anthropic, Amazon, Meta, and Mistral models when prices shift.
| Mechanic | What it means for you |
|---|---|
| Version retirement | AWS retires model versions on published end of life schedules, with windows as short as six months for fast moving families. Map commit terms against the lifecycle dates before signing; a commit on a retiring version leaves you renegotiating under deadline. |
| Switching cost | Mostly prompt and evaluation debt. Estates routing through a gateway with model agnostic prompts move workloads between model families in weeks; estates with hard coded model IDs do not move at all, and AWS account teams know which kind you are. |
| Spend categorization | Partner model consumption through Bedrock bills on your AWS invoice and counts in full toward your commit. The same model bought through an AWS Marketplace private offer historically counts toward EDP commitments only up to a 25 percent cap. Get the categorization in writing. |
Customization, Fine Tuning, and Data Retention
Fine tuning meters look small. Training bills per token processed, and custom model storage runs $1.95 per model version per month. The trap is downstream: on most Bedrock model families, a customized model historically required provisioned throughput to serve inference at scale.
That single mechanic converts a variable, per token cost into a fixed hourly obligation. A team that fine tunes a model for a modest workload can quietly attach a five figure monthly capacity charge to it. Before any customization program, price the serving path, not just the training run.
- Price the serving path first: training cost is noise; provisioned capacity for inference is the bill.
- Prefer prompt engineering and caching until a measured quality gap justifies a fixed serving cost.
- Contract the data terms: AWS states that Bedrock does not use prompts or outputs to train base models and that fine tuning data stays in your account; attach that to the agreement as a zero data retention addendum, with deletion on termination and residency terms matching your regulatory map.
The EDP Commit Interaction
Here is the mechanic buyers miss: the Bedrock negotiation is an EDP negotiation. Bedrock on demand rates are not discounted line by line; consumption counts in full toward your Enterprise Discount Program commitment and receives your EDP percentage like any other service. The discount you carry into your AI scale up was set the day you signed your last commit.
This is why AWS loves an AI forecast at renewal. Projected Bedrock growth justifies a larger commit and a longer term, and the shortfall clause means you pay the commitment whether the AI roadmap lands or not. The discount bands below are what we have observed across engagements, by annual commit size.
| Annual AWS commit | Observed EDP discount band |
|---|---|
| Under $5M | 7 to 12 percent |
| $5M to $20M | 12 to 18 percent |
| $20M to $50M | 18 to 22 percent |
| Above $50M | 22 to 28 percent |
Observed EDP discount bands by annual commit size. Benchmark ranges: Redress Compliance advisory engagement file, 2024 to 2025.
Two further mechanics shape the rollup. Provisioned throughput spend burns commit, but a 6 month capacity term is a separate, non cancelable obligation that survives workload moves. Growth baked into a multiyear commit ratchets, so commit to the floor you can prove and let success buy the expansion.
The Five Clauses That Decide Whether the Commit Protects the Budget
An EDP that mentions AI only in the forecast deck protects AWS, not you. These five clauses, negotiated as a set, are what make an AI inclusive commitment safe to sign. Every one of them has been accepted, in some form, in deals we have advised.
| Clause | What it does | Why AWS resists |
|---|---|---|
| 1. AI spend categorization | States that all Bedrock consumption, including third party model invocation, counts 100 percent toward the commit and receives the full EDP discount. | Preserves the option to treat partner model spend like capped Marketplace revenue. |
| 2. Token rate hold | Holds the on demand rate card for named models for the term, with downward moves passing through automatically. | AWS prefers rate changes to be its option, not your protection. |
| 3. Commit flex on model economics | One time right to reduce remaining commit by 10 to 15 percent on a documented model price drop or workload exit. | The shortfall clause is the entire value of the commit to AWS. |
| 4. Zero data retention addendum | Contractualizes no retention of prompts and outputs, training exclusion, deletion of fine tuning data on exit, and residency. | Standard paper points to public documentation AWS can change unilaterally. |
| 5. Model deprecation parity | Migration credits and a minimum 12 month serving runway when a committed model version is retired. | Lifecycle risk currently sits entirely with the customer. |
Priority order matters. Clauses 1 and 3 carry the money; if the negotiation budget is short, spend it there. Clause 4 is usually the cheapest to win because it codifies what AWS already claims publicly.
Discount Benchmarks Across Renewal and Exit Scenarios
Discounts move with your demonstrated willingness to walk. Across renewals in our file, buyers who arrived with a verified baseline and a priced BATNA landed at or above the top of their commit band. Buyers who arrived with neither landed at the bottom and absorbed the forecast driven commit besides.
Reduction in monthly inference cost in the worked estate from prompt caching and batch routing alone. Engineering captures this without AWS approval, which is exactly why it belongs in your baseline before the commit conversation.
Buyers in this band who walked in with a priced Azure or Google alternative consistently reached the upper edge; buyers without one settled near the floor, roughly four points of margin left on the table.
Benchmark ranges: Redress Compliance advisory engagement file, 2024 to 2025.
Exit scenarios price differently. A genuine exit, replatforming inference to another cloud or direct APIs, forfeits the EDP discount on remaining AWS spend if the commit collapses, and that math belongs in the model. In most cases the credible threat is worth more than the exit: it moves the band, the clauses, and the credits, at zero migration cost.
AWS Standard Tactics and the Buyer Side Counter Moves
AWS account teams run a consistent play sequence around AI commitments. None of it is improper; all of it is directional. The counters below neutralize each move without burning the relationship.
| AWS tactic | What it sounds like | Buyer side counter |
|---|---|---|
| The forecast anchor | "Customers like you are tripling AI spend year over year." | Replace their forecast with your baseline. Commit to verified consumption plus measured growth; structure upside as optional tranches, not committed floor. |
| The credits sweetener | "We can fund the pilot with promotional credits." | Take the credits, but read the paper: credits often carry consumption deadlines and program exclusivity that mature into commit pressure. Negotiate the term sheet as if the credits did not exist. |
| The capacity scare | "Provisioned throughput is the only way to guarantee capacity for launch." | Ask for the cross region inference and quota data first. Most launch profiles fit on demand quotas; buy capacity against measured floors only, on the shortest term. |
| The bundle close | "Roll Bedrock, SageMaker, and the data stack into one bigger commit and the discount improves." | Price the bundle against the band table in section 6. A two point discount improvement does not pay for a 40 percent commit increase. It rarely does. |
| The deadline squeeze | "This pricing expires at quarter end." | Quarter end pressure runs both directions. Hold the calendar you set in phase 1 and let their quarter, not yours, force the movement. |
BATNA Construction and the Side Letter Language
A BATNA for Bedrock is not a slide that says "Azure exists." It is a priced, dated, partially executed alternative. The build has three parts, and the first two are cheap.
- Price the same workload twice elsewhere: Azure OpenAI provisioned throughput units and Google Vertex AI carry their own committed capacity economics, and the frontier model vendors sell direct APIs with enterprise terms; a one page comparison at your token volumes changes the meeting.
- Run a live second path: five percent of traffic through a gateway to a second provider converts the comparison from hypothetical to operational.
Then write the protections into a side letter. Sample language we use as the starting position:
You will not get every sentence. Each one you do get converts a documentation promise or a sales assurance into contract. That is the entire game on a consumption product: the rate card is public, the protections are not, and only one of them is negotiable.
Our recommendation: treat the next Bedrock commitment event as an EDP negotiation with an AI annex, and start twelve weeks out.
- Before the first meeting: build the verified baseline, capture the routing yield (31 percent in the worked estate), and price the workload on two alternatives. Leverage is manufactured in these three weeks.
- At the table: size the commit to proven consumption, land clauses 1 and 3 as the floor, and put the side letter language in the first draft, not the last.
Redress Compliance runs this cycle with your team, from baseline through signature, as a fixed scope engagement. We are glad to tie a meaningful part of the fee to delivered value.