Abstract neural network visualization in blue and gold tones
AWS Practice

Amazon Bedrock pricing. What it really costs in 2026.

Per token, provisioned, or batch. The three pricing modes, the routing levers, and when a commitment actually makes sense.

Contact Us AWS Practice
500+Enterprise clients
$2B+Under advisory
Industry Recognized
500+ Enterprise Clients
$2B+ Under Advisory
11 Vendor Practices
100% Buyer Side Independent

How Amazon Bedrock pricing works in 2026: on demand token rates, provisioned throughput, batch discounts, and the routing and commitment decisions that control the bill.

Key takeaways

  • Production token volumes ran 3 to 5 times above pilot forecasts across our 2024 to 2025 reviews.
  • Output tokens price several times above input tokens on most model families.
  • Model routing policy cut unit cost more than any rate negotiation.
  • Batch inference prices at roughly half the on demand rate for latency tolerant work.
  • Buy provisioned throughput only after 90 days of sustained measured volume.
  • Bedrock spend retires EDP commitments; fold it into the agreement, not beside it.

How does Amazon Bedrock pricing actually work?

Bedrock bills per token on demand, per model unit with provisioned throughput, and at a discount for batch inference, with each foundation model carrying its own rates. Input and output tokens price separately, and output tokens typically cost several times more, so workload shape decides cost as much as volume.

The service scope sits on the Amazon Bedrock product page; the authoritative rate card is the Amazon Bedrock pricing page, with per model rates for on demand, provisioned, and batch modes. Rates differ by model provider and version, and they move; any internal cost model needs a refresh cadence.

The pricing modes in buyer terms

  • On demand: pay per input and output token, no commitment, the right default until usage is measured.
  • Provisioned throughput: committed model units by the hour with term options, for sustained high volume workloads.
  • Batch inference: asynchronous processing at roughly half the on demand rate where latency does not matter.

What drives the bill beyond tokens

Context length, retrieval payloads, and agent chains multiply token counts invisibly. A RAG application that stuffs long contexts can cost ten times a tuned one answering the same questions.

Bedrock pricing modes compared

DimensionOn demandProvisioned throughputBatch
Billing unitPer 1,000 tokensModel units per hourPer 1,000 tokens
CommitmentNoneHourly to multi month termsNone
Best fitVariable, unproven workloadsSustained production volumeOffline processing
RiskSpend spikes with trafficPaying for idle capacityLatency unsuitable for chat
Discount leverModel choice and routingTerm length on commitsAbout half of on demand

What does Bedrock cost at enterprise scale?

At scale the bill is model mix times token volume, and both are controllable. The estates that manage Bedrock cost treat model routing as the primary lever: premium models for the tasks that need them, smaller and cheaper families for the bulk of traffic.

  1. Measure token shape first: input to output ratios and context lengths per application, from CloudWatch metrics.
  2. Route by task value: classification and extraction to small models; reasoning to premium models on exception paths.
  3. Batch what can wait: summarization, enrichment, and document pipelines at the batch discount.
  4. Commit only on evidence: provisioned throughput after 90 days of sustained measured volume, never on launch forecasts.

The commitment decision

Provisioned throughput terms and model unit mechanics are defined in the Bedrock documentation. The decision rule that held in our reviews: commit to the measured floor of a workload that has run for a quarter, and leave the variance on demand.

Where the common advice on Bedrock cost is wrong

The standard advice is to negotiate provisioned throughput commitments early to lock capacity and price. We disagree. In roughly 15 of the 20 to 30 reviews Morten Andersen ran in 2024 to 2025, early committed model units sat materially underutilized, and the spend that mattered leaked through model choice and context bloat that no commitment touches. The buyer side move is 90 days of measured on demand usage, a routing policy that defaults traffic to the cheapest adequate model, and only then a provisioned commitment sized to the proven floor, inside the wider AWS agreement where it counts toward committed spend.

Abstract visualization of machine learning processing with glowing data streams
Output tokens price several times above input on most model families; workload shape moves the bill before any discount does.
20 to 30
GenAI cost reviews 2024 to 2025
3 to 5x
Production volume vs pilot forecasts
~50%
Batch discount vs on demand

Source: Redress Compliance advisory engagement file, 2024 to 2025.

The cheapest Bedrock token is the one a smaller model handled. Routing policy beats rate negotiation, every quarter, in every estate we measured.

How does Bedrock fit your AWS agreement?

Bedrock spend counts toward private pricing commitments, which is where the real discount lives. Inside an EDP or PPA, GenAI growth helps retire the commit, and incremental Bedrock volume strengthens the renewal position rather than sitting as an unmanaged line item.

  • Fold it into the commit: Bedrock consumption retires EDP obligations like any other service.
  • Forecast it separately: GenAI growth curves are steeper and less predictable; a blended forecast hides the risk.
  • Tag and allocate: per application cost allocation is the only way routing policies get enforced.

Allocation and showback

Token metrics flow through Amazon CloudWatch per model and application. A monthly showback per product team is what turns the routing policy from a document into behavior.

What to do next

  1. Pull 90 days of token metrics per application from CloudWatch.
  2. Map input to output ratios and context lengths per workload.
  3. Write a routing policy defaulting to the cheapest adequate model family.
  4. Move latency tolerant pipelines to batch inference.
  5. Size any provisioned throughput to the measured floor, not the forecast.
  6. Fold Bedrock spend into EDP commit retirement and the renewal position.
  7. Refresh the internal rate card quarterly against published pricing.

The AWS practice prices Bedrock inside EDP negotiations, and the software spend health check shows where the wider estate leaks.

Frequently asked questions

How does Amazon Bedrock charge?

Per 1,000 input and output tokens on demand, per committed model unit per hour with provisioned throughput, and at roughly half the on demand rate for batch inference, with rates set per foundation model.

Is provisioned throughput worth it?

Only for sustained, measured production volume. In our 2024 to 2025 reviews, early commitments sat underutilized while on demand would have cost less; commit to the proven floor after 90 days.

What cuts Bedrock cost the most?

Model routing. Defaulting traffic to the cheapest adequate model family and reserving premium models for exception paths cut unit cost more than any discount conversation.

Does Bedrock spend count toward an AWS EDP?

Yes. Bedrock consumption retires private pricing commitments like any other AWS service, which is where enterprise scale discounting actually lives.

Why did our Bedrock bill spike?

Usually context bloat and output heavy workloads: long RAG contexts, verbose completions, and agent chains multiply token counts invisibly. Measure token shape per application first.

Bedrock Licensing Guide

The full Bedrock licensing guide from the AWS Practice.

Token economics worksheets, routing policy templates, provisioned throughput sizing models, and the EDP integration sequence.

Used across more than five hundred enterprise engagements. Independent. Buyer side. Built for procurement leaders running the next renewal cycle.

No spam. We will only email you about this download. Privacy.
Check where your cloud and software estate leaks spend in under five minutes.
Open the Tool →