AWS Bedrock Pricing 2026: Token and PTU Cost

AWS Bedrock prices generative AI inference on three commercial shapes. On demand token rates for the standard catalog, Provisioned Throughput Units for committed capacity, and model customization for fine tuned variants.

Model it first with our Bedrock vs direct tool.

Each shape carries different cost dynamics. On demand pays per token consumed. PTU pays for reserved throughput at a fixed monthly fee. Fine tuning carries a one off training charge plus a higher inference rate.

Read this with the AWS knowledge hub, the EDP negotiation landing, the EDP flexibility article, and the AI platform contract landing. Pair it with the AWS services page and the Vendor Shield subscription.

Key takeaways

What a CIO and procurement leader need to know in 90 seconds

Three pricing shapes. On demand, PTU, fine tuning.
Model catalog is broad. Anthropic Claude family, Meta Llama, Mistral, Cohere, Amazon Titan, AI21 Jurassic, Stability AI.
On demand rates run per million tokens. Input tokens cheaper than output tokens by roughly four times.
PTU pays for reserved throughput. One month or six month commitment terms.
Fine tuning carries training plus inference cost. Inference rate higher than base model.
EDP applies to Bedrock spend. Bedrock counts toward the AWS EDP commit.
Renewal posture matters. Bedrock model versions deprecate fast. Lock the version explicitly.

We checked the pricing mechanics against AWS's own published pages rather than third party estimates: the Amazon Bedrock pricing page, the Amazon Bedrock product page, the Bedrock documentation, and the AWS Savings Plans page.

Bedrock model catalog

The Bedrock catalog hosts foundation models from multiple vendors under a single AWS contract. Each model carries its own pricing.

The model vendors

Anthropic. The Claude family of models for general reasoning and long context tasks.
Meta. Llama family for open weight model deployments.
Mistral. Mistral and Mixtral families for European data residency.
Cohere. Command family for retrieval augmented generation use cases.
Amazon. Titan and Nova families for general use plus image generation.
AI21 Labs. Jurassic family for long form generation.
Stability AI. Stable Diffusion for image generation.

How to pick a model

Run the use case through three to five models on the same prompts and compare quality, latency, and cost. The cheapest model is rarely the best fit. The most capable model is rarely the cheapest. The right pick depends on the workload.

On demand token pricing

On demand pricing charges per million tokens consumed. Each model carries a separate input and output token rate.

Indicative on demand rates per million tokens

Model family	Input USD per M tokens	Output USD per M tokens	Use case
Premium reasoning models	3.00 to 15.00	15.00 to 75.00	Long context analysis, agent workflows
Standard chat models	0.80 to 3.00	4.00 to 15.00	General chat, summarization
Lightweight models	0.20 to 0.80	1.00 to 4.00	Classification, retrieval, routing
Open weight models	0.30 to 1.50	0.60 to 3.00	Self hosted or replicated workloads
Embedding models	0.02 to 0.20	n/a	RAG retrieval

Three traps in on demand pricing

Output token premium. Output tokens cost three to five times input tokens. Long answers are expensive.
Model version drift. AWS deprecates older model versions on a rolling schedule. Pin the version in the contract.
Regional variance. Bedrock pricing varies by region. Some models are only available in us east one.

Provisioned Throughput Units

PTU pricing reserves model capacity at a fixed monthly fee. Useful for workloads with predictable throughput requirements or latency sensitivity.

How PTU is priced

Hourly rate. Per model unit per hour, billed monthly.
One month commitment. No long term lock in. Higher per hour cost.
Six month commitment. Roughly forty percent discount on the one month rate.
Custom commitment. Available through EDP overlay negotiations.

When PTU beats on demand

PTU wins when the workload runs consistent throughput at scale and the on demand cost would exceed the PTU reservation cost. The breakeven sits around forty percent utilization of the reserved capacity for most models.

Most Bedrock workloads should not be on PTU

PTU economics only work for sustained throughput at scale. Most enterprise Bedrock workloads burst on demand and sit idle most hours of the day. On demand pricing wins for those workloads. PTU is a tool for production inference at high volume.

Fine tuning and customization

Bedrock supports model fine tuning for select models. Fine tuning carries a one off training charge plus a higher per token inference rate on the customized model.

The fine tuning cost build

Training charge. Per token processed during the fine tune training run.
Storage charge. Per month per custom model variant.
Inference charge. Higher per token rate than the base model, often two times.
PTU requirement. Some fine tuned models require Provisioned Throughput Units for inference.

Alternatives to fine tuning

Retrieval augmented generation. Pull relevant context from a vector store at inference time. Cheaper, simpler.
Few shot prompting. Embed examples in the system prompt.
Prompt caching. Reuse expensive prompt prefixes across calls.
Knowledge base integration. Bedrock Knowledge Bases handles RAG natively.

Bedrock economics live in the prompt design, not the contract. A team that cuts output token counts in half saves real money on the EDP commit. A team that fine tunes when retrieval would do pays twice for the same answer quality.

EDP overlay and Bedrock

AWS Bedrock spend counts toward the Enterprise Discount Program commit on most EDP contracts. The customer can negotiate a Bedrock specific discount layer on top of the broader EDP discount.

The Bedrock discount layer

Base EDP discount. Five to fifteen percent on infrastructure spend.
Bedrock specific layer. Five to ten percent additional discount on Bedrock token rates and PTU rates.
Marketplace pull through. Some Anthropic Bedrock spend counts toward the EDP at full value through Marketplace channels.
Annual commit. The Bedrock spend forecast counts toward the year on year EDP commit ramp.

Two traps in the EDP overlay

Catalog drift. AWS adds and removes models on a rolling schedule. Pin the model list in the order form.
Discount erosion. AWS account teams sometimes propose a Bedrock specific commit that locks the customer to the published list rate.

Renewal posture

AWS EDP contracts run on three to five year terms. The Bedrock pricing position should be reopened at every renewal.

Renewal checklist

Recompute the Bedrock spend. Token volume by model and use case.
Model the multi vendor BATNA. Azure OpenAI, Google Cloud Vertex AI, direct Anthropic API.
Test the PTU mix. Adjust the on demand to PTU ratio based on usage patterns.
Pin the catalog. Confirm the model list and version retirement schedule in writing.
Negotiate the discount layer. Bedrock specific discount above the base EDP discount.

What to do next

The seven step checklist below is the buyer side starting position before any Bedrock renewal conversation.

Baseline Bedrock usage. Tokens by model, by use case, by region.
Identify the PTU candidates. Workloads above forty percent utilization of reserved capacity.
Inventory the fine tuned models. Confirm each one beats RAG on quality.
Run the multi vendor BATNA. Azure OpenAI, Vertex AI, direct Anthropic, direct Mistral.
Pin the catalog and version list. In the EDP order form.
Negotiate the Bedrock discount layer. Above the base EDP discount.
Engage independent advisors. Buyer side only, no AWS conflict of interest.

Where the common advice on AWS Bedrock pricing is wrong

The common advice is to start every Bedrock workload on on demand pricing and optimize later. We disagree. In the Bedrock estates we reviewed, later rarely came, and steady production traffic sat on on demand rates 30 to 60 percent above provisioned throughput for months. The buyer side move is to forecast steady state token volume early, move predictable load to provisioned throughput or a commitment, and fold Bedrock into the EDP. Treating generative spend as experimental long after it is production is what inflates the bill.

Close up of a server processor and circuit board — Bedrock cost is governed more by model selection and commitment tier than by raw prompt volume.

48%

Typical on demand token premium

20+

Bedrock reviews since 2024

10x

Model cost spread for same task

Source: Redress Compliance advisory engagement file, 2024 to 2025.

Frequently asked questions

Does Bedrock count toward the AWS EDP commit?

Yes. Bedrock spend counts toward the EDP commit on most contracts. The customer can negotiate a Bedrock specific discount layer on top of the broader EDP discount. The Bedrock spend forecast should be included in the year on year EDP commit ramp at signature, not added later as an overlay.

Should I use Provisioned Throughput Units?

Only for sustained throughput at scale. PTU economics work when the reserved capacity runs above forty percent utilization. Most enterprise Bedrock workloads burst on demand and sit idle most hours of the day. On demand pricing wins for those workloads. PTU is a tool for production inference at high volume.

Is Bedrock cheaper than direct Anthropic or direct OpenAI?

Sometimes. Bedrock token rates for Claude models often match the direct Anthropic API. The discount layer in an EDP commit can push the Bedrock rate below the direct API rate. The procurement question is whether the AWS contract overlay plus enterprise discount delivers a better total cost than the direct vendor relationship at the customer's scale.

How do I avoid the model version drift problem?

Pin the model list and version retirement schedule in the EDP order form. AWS deprecates older Bedrock model versions on a rolling schedule. Without explicit version language, the customer can find a production workload running on a deprecated model that AWS plans to retire on three months notice.

Is fine tuning worth the cost?

Rarely as a first option. Retrieval augmented generation and few shot prompting solve most quality problems at lower cost and lower lock in. Fine tuning makes sense when the workload requires very specific style, format, or domain knowledge that prompting cannot achieve. Run the RAG and prompting path first.

How does Redress engage on Bedrock renewal negotiations?

Redress runs AWS Bedrock advisory inside the Vendor Shield subscription and the Renewal Program. Every engagement is led by a former AWS commercial executive on the buyer side and supported by the Bedrock benchmark we maintain across recent EDP renewals at similar scale and workload profile.

Is provisioned throughput cheaper than on demand on Bedrock?

Provisioned throughput is cheaper than on demand for steady, predictable Bedrock workloads, often by 30 to 60 percent per token. It requires committing to a model unit for an hourly or monthly term, so it suits production traffic rather than spiky experiments. Mixed estates usually run experiments on demand and production on provisioned capacity.

Does Bedrock spend count toward an AWS EDP commitment?

Yes. Amazon Bedrock usage counts toward an Enterprise Discount Program commitment like other AWS services. Buyers frequently forget to include forecast Bedrock growth when sizing the EDP, which understates the commitment they could trade for a deeper discount. Model the generative roadmap before signing the term.

How Redress engages on AWS advisory

Redress runs AWS advisory inside the Vendor Shield subscription, the Renewal Program, the Benchmark Program, and the Software Spend Assessment.

Read the related benchmarking, about us, locations, and contact pages.