AWS Bedrock Licensing: 2026 Pricing and Levers

AWS Bedrock bills per token on demand, per model unit when provisioned, and through your existing AWS agreement, which makes it the AI spend that is easiest to govern and easiest to ignore.

Key takeaways

Three pricing modes: on demand per token, batch at roughly half the on demand rate, and provisioned throughput per model unit.
The model choice is the price: rates vary by an order of magnitude across model providers and sizes on the same task.
Bedrock spend feeds the EDP: consumption counts toward AWS commit programs, which makes it negotiating currency.
Provisioned is a commitment: model units bill by the hour whether used or not; buy them against measured load only.
Batch is the free discount: half price for any workload that tolerates asynchronous processing.
Routing is the lever: matching each task to the cheapest sufficient model cuts 30 to 60 percent off naive deployments.

How does AWS Bedrock pricing actually work?

Bedrock charges for model inference per 1,000 input and output tokens on demand, with rates set per model provider and size. There is no platform subscription; the meter is the contract. The full rate card sits on the AWS Bedrock pricing page and changes as providers reprice.

Output tokens typically cost several times input tokens, and long context workloads multiply both sides of the meter. Prompt design is a procurement concern here, not just an engineering one.

On demand: per token, no commitment, the right default while load is unmeasured.
Batch: roughly half the on demand rate for asynchronous jobs.
Provisioned throughput: hourly model units for steady, latency sensitive production load.

When does provisioned throughput beat on demand?

Provisioned throughput wins when production load is steady, measured, and latency sensitive; it loses everywhere else because model units bill by the hour regardless of use. The breakeven sits where sustained utilization of a model unit clears roughly the same token volume bought on demand.

Bedrock pricing modes, buyer view

Mode	Billing basis	Best for	Risk
On demand	Per 1,000 tokens	Variable and unproven load	Cost spikes without caps
Batch	Per token, discounted	Asynchronous processing	None beyond latency
Provisioned	Per model unit hour	Steady production inference	Idle units bill anyway

Measure before you commit

Run new workloads on demand with budget alerts for a full business cycle before buying provisioned capacity. The estates that inverted this order carried 25 to 40 percent idle provisioned units in our file.

How does Bedrock interact with an AWS EDP?

Bedrock consumption counts toward AWS commit programs, which cuts both ways. It helps retire an existing commit, and it gives AWS a growth story to anchor a larger renewal ask. The Bedrock platform is strategic for AWS, and that makes your AI roadmap negotiating currency.

Inside an EDP: Bedrock spend retires commit at your discounted effective rate.
At renewal: bring measured AI projections, not vendor enthusiasm, into the commit sizing.
Private pricing: sustained Bedrock volume justifies service specific terms beyond the EDP percentage.

Where the common advice on Bedrock costs is wrong

The standard advice is to negotiate a bigger AWS discount to bring AI costs down. We disagree. In roughly 8 of the 10 to 14 AWS AI engagements we advised in 2024 to 2025, model routing moved 30 to 60 percent of the bill while incremental discount moved single digits. The buyer side move is an internal model routing standard, smallest sufficient model per task class, enforced in the application layer. No discount percentage survives comparison with not sending the tokens at all.

Closeup of a processor chip on a dark circuit board — Token meters reward engineering discipline the way power meters reward insulation: the cheapest optimization is always the request you stopped sending.

30 to 60%

Overspend from unrouted frontier model use

25 to 40%

Idle share of prematurely provisioned units

~50%

Batch discount vs on demand rates

Source: Redress Compliance advisory engagement file, 2024 to 2025.

The cheapest token is the one a smaller model handled. Routing policy is the only AI discount that compounds.

Where does AWS document the commitment mechanics?

The commitment terms sit in the provisioned throughput documentation and the Bedrock FAQ, including the term options and their discount steps. Read them before the account team models the purchase; the no commitment option is the one the model rarely leads with.

What levers actually cut Bedrock spend?

Five levers, in order of impact: model routing, prompt and context discipline, batch conversion, caching, and only then commercial structure. The first four are engineering policies with procurement consequences; the fifth is where the EDP and private pricing land.

Routing standard: classify tasks and assign the smallest sufficient model per class.
Context discipline: trim prompts and retrieval payloads; long context multiplies the meter.
Batch conversion: move every asynchronous workload to batch pricing.
Caching: deduplicate repeated inference at the application layer.
Commercial structure: EDP integration, budget caps, and private pricing on sustained volume.

What to do next

The moves below turn the Bedrock meter into a governed, negotiable cost line.

A sequence you can run this quarter

Tag all Bedrock usage by application and task class in cost allocation reports.
Run a routing review: which task classes can drop to smaller or cheaper models.
Convert every asynchronous workload to batch processing.
Set budget alerts and hard caps per application before scaling anything.
Hold provisioned throughput purchases until a full cycle of measured load exists.
Bring measured AI projections into the next EDP conversation as commit currency.

White Paper · AWS

AWS Bedrock. Enterprise AI inference licensing

What AWS Bedrock really costs to run at enterprise scale: token pricing, provisioned throughput commits, customization fees, and the EDP rollup. Read it free.

Read the white paper

Need help? Try our AI agents. Ask the AWS commercial AI agent → Scoped to one vendor and one problem. Runs in your browser.

Frequently asked questions

How is AWS Bedrock licensed?

Bedrock has no license or subscription; it bills for usage, per 1,000 tokens on demand, discounted for batch, or per model unit hour for provisioned throughput. Costs ride your existing AWS agreement and count toward commit programs.

What does Bedrock cost per token?

Rates vary by an order of magnitude across model providers and sizes, with output tokens costing several times input tokens. The current rate card on the AWS Bedrock pricing page is the only reliable reference because providers reprice frequently.

Does Bedrock usage count toward an AWS EDP?

Yes. Bedrock consumption retires EDP commit like other AWS service spend, which makes measured AI growth projections useful currency in commit negotiations and renewals.

When should you buy provisioned throughput on Bedrock?

Only after a full business cycle of measured production load that is steady and latency sensitive. Model units bill hourly whether used or not, and early buyers in our file carried 25 to 40 percent idle capacity.

How do you stop Bedrock costs from running away?

Budget alerts and hard caps per application, a routing standard that assigns the smallest sufficient model per task, and batch pricing for asynchronous work. Routing alone cut 30 to 60 percent in the deployments we benchmarked.

Is Bedrock cheaper than going direct to model providers?

It depends on the model and volume; direct provider contracts can undercut Bedrock at scale, while Bedrock wins on EDP integration and operational simplicity. Price both against your routed workload mix before committing either way.

Vendor Advisory

Cloud & Emerging

Programs

Advisory Services

Assessments

Research

Knowledge Hubs

Tool Hubs

AWS Bedrock licensing, tokens, throughput, and the EDP.