Every model on Bedrock carries its own token rate. Here is how the pricing works, when each buying mode wins, and the levers that cut your spend.
Amazon Bedrock prices each foundation model on its own token rate, with two buying modes. The mode you pick decides whether you pay for what you use or for capacity you reserve.
This guide is for cloud architects and FinOps teams budgeting Amazon Bedrock. Read it with the AWS Bedrock pricing guide and the AWS EDP pillar.
Bedrock looks like a single service, but every model on it carries its own price. Budgeting means knowing the model rates and the two buying modes, then matching the mode to the workload.
Bedrock charges for model inference by the token, with input and output tokens often priced at different rates. Each foundation model sets its own rate, so the bill depends on which models you call.
A token is a chunk of text, roughly a few characters. You pay per thousand or per million tokens, split between input and output. The Amazon Bedrock pricing page lists the current per model rates.
On demand bills the tokens you process with no commitment. Provisioned throughput reserves dedicated capacity for an hourly fee. The right mode depends on whether your traffic is steady or spiky.
On demand is the default. You pay only for the tokens you process, which suits variable or early stage workloads where volume is hard to predict.
Bursty, low volume, or experimental workloads fit on demand. You avoid paying for idle capacity and the cost tracks usage directly, which keeps early projects cheap to run.
At steady high volume, on demand token costs can exceed the cost of reserved capacity. A workload that runs constantly is usually cheaper on provisioned throughput once volume is predictable.
Bedrock buying modes compared
| Mode | Billing | Best for |
|---|---|---|
| On demand | Per token used | Variable or early workloads |
| Provisioned throughput | Hourly per model unit | Steady high volume |
| EDP folded | Counts to commitment | Predictable enterprise spend |
Bedrock looks like one service, but every model carries its own token rate. The bill follows the models you call and the prompts you send.
Provisioned throughput reserves model units for a committed period at an hourly rate. It trades flexibility for a lower effective cost per token at scale.
A model unit is a block of guaranteed throughput for a specific model. You pay for the unit by the hour whether you use the full capacity or not, so it rewards steady, high utilization.
High volume, latency sensitive, production workloads justify provisioned throughput. Once a model runs near capacity for most of the day, the reserved rate beats on demand token pricing.
The levers are model choice, prompt discipline, mode selection, and EDP placement. Most of them sit with the engineering team, not the AWS account manager.
Every token costs money. Trimming bloated prompts, capping output length, and caching repeated context cut token volume directly. This is the largest lever a team controls without touching a contract.
Bedrock spend counts toward an AWS Enterprise Discount Program commitment. Folding predictable Bedrock volume into the EDP can improve the overall discount, so model the spend before the EDP is sized.
Amazon Bedrock charges per token for model inference, with input and output tokens often priced at different rates. Each foundation model sets its own rate, so the bill depends on which models you call and how much text you process.
On demand bills only the tokens you process with no commitment, while provisioned throughput reserves dedicated model capacity for an hourly fee. On demand suits variable workloads; provisioned suits steady high volume.
A model unit is a block of guaranteed throughput for a specific model under provisioned throughput. You pay for it by the hour regardless of utilization, so it rewards steady, high use workloads.
Yes, Bedrock spend counts toward an AWS Enterprise Discount Program commitment. Folding predictable Bedrock volume into the EDP can improve the overall discount, so model the spend before sizing the commitment.
Prompt and output discipline. Trimming bloated prompts, capping output length, and caching repeated context cut token volume directly, which is the largest lever a team controls without touching a contract.
It depends on volume. On demand is cheaper for variable or low volume workloads, while provisioned throughput is cheaper once a model runs near capacity for most of the day at predictable, steady volume.
Amazon Bedrock pricing models, EDP commit interaction, model unit economics, and the buyer side moves across the full AWS estate.
Used across more than five hundred enterprise engagements. Independent. Buyer side. Built for procurement leaders running the next renewal cycle.
The biggest Bedrock saving is rarely in the contract. It is in the prompts your engineers send every second.
500+ enterprise clients. 11 vendor practices. Industry recognized. One conversation can change what you pay for the next three years.
Bedrock token math, EDP levers, and FinOps moves that hold spend down. No noise.