Amazon Bedrock Pricing 2026: Real Cost

Amazon Bedrock prices each foundation model on its own token rate, with two buying modes. The mode you pick decides whether you pay for what you use or for capacity you reserve.

Key takeaways

Bedrock charges per input and output token, priced separately by model.
On demand pricing bills only the tokens you actually process.
Provisioned throughput reserves model capacity for a fixed hourly fee.
Each foundation model on Bedrock has its own token rate.
Bedrock spend can be folded into an AWS EDP commitment.
Prompt and output size discipline is the largest controllable cost lever.

This guide is for cloud architects and FinOps teams budgeting Amazon Bedrock. Read it with the AWS Bedrock pricing guide and the AWS EDP pillar.

Bedrock looks like a single service, but every model on it carries its own price. Budgeting means knowing the model rates and the two buying modes, then matching the mode to the workload.

How is Amazon Bedrock priced in 2026?

Bedrock charges for model inference by the token, with input and output tokens often priced at different rates. Each foundation model sets its own rate, so the bill depends on which models you call.

What is a token and why does it matter?

A token is a chunk of text, roughly a few characters. You pay per thousand or per million tokens, split between input and output. The Amazon Bedrock pricing page lists the current per model rates.

Input tokens: the text you send to the model.
Output tokens: the text the model returns, often priced higher.
Per model: each foundation model has its own rate.

What are the two buying modes?

On demand bills the tokens you process with no commitment. Provisioned throughput reserves dedicated capacity for an hourly fee. The right mode depends on whether your traffic is steady or spiky.

When does on demand pricing make sense?

On demand is the default. You pay only for the tokens you process, which suits variable or early stage workloads where volume is hard to predict.

Which workloads suit on demand?

Bursty, low volume, or experimental workloads fit on demand. You avoid paying for idle capacity and the cost tracks usage directly, which keeps early projects cheap to run.

Where does on demand get expensive?

At steady high volume, on demand token costs can exceed the cost of reserved capacity. A workload that runs constantly is usually cheaper on provisioned throughput once volume is predictable.

Bedrock buying modes compared

Mode	Billing	Best for
On demand	Per token used	Variable or early workloads
Provisioned throughput	Hourly per model unit	Steady high volume
EDP folded	Counts to commitment	Predictable enterprise spend

Bedrock looks like one service, but every model carries its own token rate. The bill follows the models you call and the prompts you send.

How does provisioned throughput pricing work?

Provisioned throughput reserves model units for a committed period at an hourly rate. It trades flexibility for a lower effective cost per token at scale.

What are model units?

A model unit is a block of guaranteed throughput for a specific model. You pay for the unit by the hour whether you use the full capacity or not, so it rewards steady, high utilization.

Reserved capacity: guaranteed throughput per model unit.
Hourly fee: charged regardless of utilization.
Commitment terms: longer terms lower the hourly rate.

Which workloads justify provisioned capacity?

High volume, latency sensitive, production workloads justify provisioned throughput. Once a model runs near capacity for most of the day, the reserved rate beats on demand token pricing.

What buyer side moves cut Bedrock cost?

The levers are model choice, prompt discipline, mode selection, and EDP placement. Most of them sit with the engineering team, not the AWS account manager.

Why is prompt size the biggest lever?

Every token costs money. Trimming bloated prompts, capping output length, and caching repeated context cut token volume directly. This is the largest lever a team controls without touching a contract.

How does the AWS EDP affect Bedrock?

Bedrock spend counts toward an AWS Enterprise Discount Program commitment. Folding predictable Bedrock volume into the EDP can improve the overall discount, so model the spend before the EDP is sized.

What to do next

List the foundation models your workloads call on Bedrock.
Pull the current per model token rates for input and output.
Classify each workload as variable or steady high volume.
Put variable workloads on on demand and steady ones on provisioned throughput.
Audit prompts and output limits to cut token volume at the source.
Model predictable Bedrock spend against your AWS EDP commitment.
Set monthly token monitoring per model and per workload.

Frequently asked questions

How is Amazon Bedrock priced?

Amazon Bedrock charges per token for model inference, with input and output tokens often priced at different rates. Each foundation model sets its own rate, so the bill depends on which models you call and how much text you process.

What is the difference between on demand and provisioned throughput?

On demand bills only the tokens you process with no commitment, while provisioned throughput reserves dedicated model capacity for an hourly fee. On demand suits variable workloads; provisioned suits steady high volume.

What is a model unit in Bedrock?

A model unit is a block of guaranteed throughput for a specific model under provisioned throughput. You pay for it by the hour regardless of utilization, so it rewards steady, high use workloads.

Can Bedrock spend count toward an AWS EDP?

Yes, Bedrock spend counts toward an AWS Enterprise Discount Program commitment. Folding predictable Bedrock volume into the EDP can improve the overall discount, so model the spend before sizing the commitment.

What is the biggest lever to cut Bedrock cost?

Prompt and output discipline. Trimming bloated prompts, capping output length, and caching repeated context cut token volume directly, which is the largest lever a team controls without touching a contract.

Is on demand or provisioned cheaper?

It depends on volume. On demand is cheaper for variable or low volume workloads, while provisioned throughput is cheaper once a model runs near capacity for most of the day at predictable, steady volume.

Vendor Advisory

Cloud & Emerging

Programs

Assessments

Research

Knowledge Hubs

Assessment Tools

Amazon Bedrock pricing what it really costs in 2026.