Per token, provisioned, or batch. The three pricing modes, the routing levers, and when a commitment actually makes sense.
How Amazon Bedrock pricing works in 2026: on demand token rates, provisioned throughput, batch discounts, and the routing and commitment decisions that control the bill.
Bedrock bills per token on demand, per model unit with provisioned throughput, and at a discount for batch inference, with each foundation model carrying its own rates. Input and output tokens price separately, and output tokens typically cost several times more, so workload shape decides cost as much as volume.
The service scope sits on the Amazon Bedrock product page; the authoritative rate card is the Amazon Bedrock pricing page, with per model rates for on demand, provisioned, and batch modes. Rates differ by model provider and version, and they move; any internal cost model needs a refresh cadence.
Context length, retrieval payloads, and agent chains multiply token counts invisibly. A RAG application that stuffs long contexts can cost ten times a tuned one answering the same questions.
Bedrock pricing modes compared
| Dimension | On demand | Provisioned throughput | Batch |
|---|---|---|---|
| Billing unit | Per 1,000 tokens | Model units per hour | Per 1,000 tokens |
| Commitment | None | Hourly to multi month terms | None |
| Best fit | Variable, unproven workloads | Sustained production volume | Offline processing |
| Risk | Spend spikes with traffic | Paying for idle capacity | Latency unsuitable for chat |
| Discount lever | Model choice and routing | Term length on commits | About half of on demand |
At scale the bill is model mix times token volume, and both are controllable. The estates that manage Bedrock cost treat model routing as the primary lever: premium models for the tasks that need them, smaller and cheaper families for the bulk of traffic.
Provisioned throughput terms and model unit mechanics are defined in the Bedrock documentation. The decision rule that held in our reviews: commit to the measured floor of a workload that has run for a quarter, and leave the variance on demand.
The standard advice is to negotiate provisioned throughput commitments early to lock capacity and price. We disagree. In roughly 15 of the 20 to 30 reviews Morten Andersen ran in 2024 to 2025, early committed model units sat materially underutilized, and the spend that mattered leaked through model choice and context bloat that no commitment touches. The buyer side move is 90 days of measured on demand usage, a routing policy that defaults traffic to the cheapest adequate model, and only then a provisioned commitment sized to the proven floor, inside the wider AWS agreement where it counts toward committed spend.
Source: Redress Compliance advisory engagement file, 2024 to 2025.
The cheapest Bedrock token is the one a smaller model handled. Routing policy beats rate negotiation, every quarter, in every estate we measured.
Bedrock spend counts toward private pricing commitments, which is where the real discount lives. Inside an EDP or PPA, GenAI growth helps retire the commit, and incremental Bedrock volume strengthens the renewal position rather than sitting as an unmanaged line item.
Token metrics flow through Amazon CloudWatch per model and application. A monthly showback per product team is what turns the routing policy from a document into behavior.
The AWS practice prices Bedrock inside EDP negotiations, and the software spend health check shows where the wider estate leaks.
Per 1,000 input and output tokens on demand, per committed model unit per hour with provisioned throughput, and at roughly half the on demand rate for batch inference, with rates set per foundation model.
Only for sustained, measured production volume. In our 2024 to 2025 reviews, early commitments sat underutilized while on demand would have cost less; commit to the proven floor after 90 days.
Model routing. Defaulting traffic to the cheapest adequate model family and reserving premium models for exception paths cut unit cost more than any discount conversation.
Yes. Bedrock consumption retires private pricing commitments like any other AWS service, which is where enterprise scale discounting actually lives.
Usually context bloat and output heavy workloads: long RAG contexts, verbose completions, and agent chains multiply token counts invisibly. Measure token shape per application first.
Token economics worksheets, routing policy templates, provisioned throughput sizing models, and the EDP integration sequence.
Used across more than five hundred enterprise engagements. Independent. Buyer side. Built for procurement leaders running the next renewal cycle.