Amazon Bedrock Pricing 2026: Real Cost

How Amazon Bedrock pricing works in 2026: on demand token rates, provisioned throughput, batch discounts, and the routing and commitment decisions that control the bill.

Key takeaways

Production token volumes ran 3 to 5 times above pilot forecasts across our 2024 to 2025 reviews.
Output tokens price several times above input tokens on most model families.
Model routing policy cut unit cost more than any rate negotiation.
Batch inference prices at roughly half the on demand rate for latency tolerant work.
Buy provisioned throughput only after 90 days of sustained measured volume.
Bedrock spend retires EDP commitments; fold it into the agreement, not beside it.

How does Amazon Bedrock pricing actually work?

Bedrock bills per token on demand, per model unit with provisioned throughput, and at a discount for batch inference, with each foundation model carrying its own rates. Input and output tokens price separately, and output tokens typically cost several times more, so workload shape decides cost as much as volume.

The service scope sits on the Amazon Bedrock product page; the authoritative rate card is the Amazon Bedrock pricing page, with per model rates for on demand, provisioned, and batch modes. Rates differ by model provider and version, and they move; any internal cost model needs a refresh cadence.

The pricing modes in buyer terms

On demand: pay per input and output token, no commitment, the right default until usage is measured.
Provisioned throughput: committed model units by the hour with term options, for sustained high volume workloads.
Batch inference: asynchronous processing at roughly half the on demand rate where latency does not matter.

What drives the bill beyond tokens

Context length, retrieval payloads, and agent chains multiply token counts invisibly. A RAG application that stuffs long contexts can cost ten times a tuned one answering the same questions.

Bedrock pricing modes compared

Dimension	On demand	Provisioned throughput	Batch
Billing unit	Per 1,000 tokens	Model units per hour	Per 1,000 tokens
Commitment	None	Hourly to multi month terms	None
Best fit	Variable, unproven workloads	Sustained production volume	Offline processing
Risk	Spend spikes with traffic	Paying for idle capacity	Latency unsuitable for chat
Discount lever	Model choice and routing	Term length on commits	About half of on demand

What does Bedrock cost at enterprise scale?

At scale the bill is model mix times token volume, and both are controllable. The estates that manage Bedrock cost treat model routing as the primary lever: premium models for the tasks that need them, smaller and cheaper families for the bulk of traffic.

Measure token shape first: input to output ratios and context lengths per application, from CloudWatch metrics.
Route by task value: classification and extraction to small models; reasoning to premium models on exception paths.
Batch what can wait: summarization, enrichment, and document pipelines at the batch discount.
Commit only on evidence: provisioned throughput after 90 days of sustained measured volume, never on launch forecasts.

The commitment decision

Provisioned throughput terms and model unit mechanics are defined in the Bedrock documentation. The decision rule that held in our reviews: commit to the measured floor of a workload that has run for a quarter, and leave the variance on demand.

Where the common advice on Bedrock cost is wrong

The standard advice is to negotiate provisioned throughput commitments early to lock capacity and price. We disagree. In roughly 15 of the 20 to 30 reviews Morten Andersen ran in 2024 to 2025, early committed model units sat materially underutilized, and the spend that mattered leaked through model choice and context bloat that no commitment touches. The buyer side move is 90 days of measured on demand usage, a routing policy that defaults traffic to the cheapest adequate model, and only then a provisioned commitment sized to the proven floor, inside the wider AWS agreement where it counts toward committed spend.

Abstract visualization of machine learning processing with glowing data streams — Output tokens price several times above input on most model families; workload shape moves the bill before any discount does.

20 to 30

GenAI cost reviews 2024 to 2025

3 to 5x

Production volume vs pilot forecasts

~50%

Batch discount vs on demand

Source: Redress Compliance advisory engagement file, 2024 to 2025.

The cheapest Bedrock token is the one a smaller model handled. Routing policy beats rate negotiation, every quarter, in every estate we measured.

How does Bedrock fit your AWS agreement?

Bedrock spend counts toward private pricing commitments, which is where the real discount lives. Inside an EDP or PPA, GenAI growth helps retire the commit, and incremental Bedrock volume strengthens the renewal position rather than sitting as an unmanaged line item.

Fold it into the commit: Bedrock consumption retires EDP obligations like any other service.
Forecast it separately: GenAI growth curves are steeper and less predictable; a blended forecast hides the risk.
Tag and allocate: per application cost allocation is the only way routing policies get enforced.

Allocation and showback

Token metrics flow through Amazon CloudWatch per model and application. A monthly showback per product team is what turns the routing policy from a document into behavior.

What to do next

Pull 90 days of token metrics per application from CloudWatch.
Map input to output ratios and context lengths per workload.
Write a routing policy defaulting to the cheapest adequate model family.
Move latency tolerant pipelines to batch inference.
Size any provisioned throughput to the measured floor, not the forecast.
Fold Bedrock spend into EDP commit retirement and the renewal position.
Refresh the internal rate card quarterly against published pricing.

The AWS practice prices Bedrock inside EDP negotiations, and the software spend health check shows where the wider estate leaks.

Need help? Try our AI agents. Ask the AWS commercial AI agent → Scoped to one vendor and one problem. Runs in your browser.

Frequently asked questions

How does Amazon Bedrock charge?

Per 1,000 input and output tokens on demand, per committed model unit per hour with provisioned throughput, and at roughly half the on demand rate for batch inference, with rates set per foundation model.

Is provisioned throughput worth it?

Only for sustained, measured production volume. In our 2024 to 2025 reviews, early commitments sat underutilized while on demand would have cost less; commit to the proven floor after 90 days.

What cuts Bedrock cost the most?

Model routing. Defaulting traffic to the cheapest adequate model family and reserving premium models for exception paths cut unit cost more than any discount conversation.

Does Bedrock spend count toward an AWS EDP?

Yes. Bedrock consumption retires private pricing commitments like any other AWS service, which is where enterprise scale discounting actually lives.

Why did our Bedrock bill spike?

Usually context bloat and output heavy workloads: long RAG contexts, verbose completions, and agent chains multiply token counts invisibly. Measure token shape per application first.

Vendor Advisory

Cloud & Emerging

Programs

Advisory Services

Assessments

Research

Knowledge Hubs

Tool Hubs

Amazon Bedrock pricing. What it really costs in 2026.