Why AWS Holds the Information Advantage on AI Pricing
AWS's AI service portfolio — Amazon Bedrock, SageMaker, Comprehend, Textract, Transcribe, and the growing constellation of AI and ML services — is priced on consumption-based models that make cost forecasting extraordinarily difficult. Per-token inference, per-hour training, per-page processing, per-minute transcription: each service has its own pricing unit, its own rate card, and its own commitment mechanics. For enterprises in the early stages of AI adoption, this complexity is compounded by the absence of historical consumption data, established benchmarks, or mature FinOps practices for AI workloads.
The result is a commercial environment where AWS holds an asymmetric information advantage. AWS understands its own pricing architecture intimately; most enterprises do not. AWS can model the margin structure of every AI service; most enterprises cannot even forecast their consumption accurately. This asymmetry creates an opportunity for enterprises that invest in understanding the pricing mechanics before committing — and a significant cost risk for those that don't.
Five core findings frame this paper. First, AWS AI service pricing is 25 to 50 percent negotiable through Provisioned Throughput commitments and EDP-linked AI agreements — but fewer than 15 percent of enterprises negotiate AI pricing independently. Second, Bedrock Provisioned Throughput creates 40 to 60 percent savings over on-demand inference, but commits you to specific model capacity that becomes stranded if usage patterns or model preferences change. Third, AWS embeds AI spend into EDP commitments in a way that obscures whether AI services are priced competitively on a standalone basis. Fourth, model portability is the single most valuable negotiation lever for AWS AI pricing — the same foundation models run on multiple competing platforms. Fifth, the contractual protections that matter for AI commitments are fundamentally different from those that matter for compute, and most enterprise agreements don't include them.
See how enterprises reduce AWS cloud costs by 30 to 40 percent
Read our AWS advisory case studies to see the commercial strategies that deliver results.
How Amazon Bedrock, SageMaker, and AI Services Are Priced
AWS's AI portfolio spans dozens of services, but the commercial significance for most enterprises concentrates in four categories: foundation model inference (Bedrock), custom model training and hosting (SageMaker), document intelligence (Textract), and speech and language services (Transcribe, Comprehend, Translate). Each has a distinct pricing model that requires separate analysis.
Amazon Bedrock: Foundation Model Inference
Bedrock provides access to foundation models from Anthropic (Claude), Meta (Llama), Amazon (Titan, Nova), Mistral, Cohere, and others through a unified API. Pricing operates on two tracks: on-demand (per-token, per-request, pay-as-you-go) and Provisioned Throughput (reserved model units with committed capacity and dedicated compute). On-demand pricing varies dramatically by model — Claude 3.5 Sonnet input tokens cost approximately $3 per million tokens; Claude 3 Haiku costs approximately $0.25 per million tokens. Output tokens are typically 3 to 5 times the input price. Provisioned Throughput provides dedicated capacity at committed rates, typically 40 to 60 percent below on-demand for sustained usage.
Key negotiation variable: Provisioned Throughput commitment level and term. AWS offers 1-month and 6-month commitments. Longer terms and higher capacity levels unlock deeper discounts — but model portability risk must be factored into any commitment beyond 3 months.
Amazon SageMaker: ML Training and Hosting
SageMaker pricing is primarily compute-based — per-hour rates for training instances (ml.p4d, ml.p5, ml.trn1) and inference endpoints. Training costs are driven by instance type, training duration, and data volume. Inference endpoint costs are driven by instance type and uptime hours. SageMaker Savings Plans offer 1-year or 3-year committed pricing at discounts of up to 64 percent. For enterprises fine-tuning foundation models or training custom models, SageMaker compute costs can scale rapidly. A single training job on ml.p5.48xlarge instances can cost $100 or more per hour — a 72-hour training run on a single instance costs $7,200. Multi-instance distributed training multiplies this linearly.
Key negotiation variable: SageMaker Savings Plans provide committed discount on ML compute. Coordinate with overall Compute Savings Plans to avoid commitment overlap — SageMaker Savings Plans and EC2 Savings Plans are separate instruments with different scope.
Document and Language AI: Textract, Comprehend, Translate
AWS's document and language services are priced per processing unit: Textract charges per page analysed ($1.50 per 1,000 pages for form extraction), Comprehend charges per unit of text analysed ($0.0001 per unit for entity recognition), and Translate charges per character ($15 per million characters). Volume tiering provides automatic discounts at higher consumption levels. These services lack the explicit commitment instruments available for Bedrock and SageMaker. Discount leverage comes primarily through EDP-level negotiation and volume tier acceleration — negotiating bespoke pricing tiers for high-volume consumers that significantly improve on published rates.
AI Infrastructure: Trainium, Inferentia, GPU Instances
AWS's custom AI chips — Trainium (for training) and Inferentia (for inference) — offer 40 to 60 percent better price-performance than equivalent GPU instances for supported workloads. Standard GPU instances (P4d, P5, G5) provide broader framework compatibility at higher per-unit cost. The choice between Trainium and Inferentia versus NVIDIA GPU instances is a strategic decision that affects both price and flexibility. Trainium offers the best economics but requires framework adaptation; NVIDIA instances offer maximum compatibility at premium pricing. High-demand AI instances (P5, Trn1) have constrained availability — negotiated capacity commitments with price guarantees are increasingly important for production AI workloads.
Section 03 — Bedrock Deep DiveOn-Demand, Provisioned Throughput, and Batch Inference: The Full Economics
Bedrock is where most enterprise AI spend concentrates — and where the pricing complexity and negotiation opportunity are greatest. Understanding the three consumption modes and their economics is essential to optimising Bedrock costs.
On-Demand Inference
On-demand Bedrock pricing charges per input token and per output token, with rates varying by model. There are no commitments, no minimum spend, and no volume discounts in the standard on-demand model. This is the simplest pricing but also the most expensive — the flexibility premium is 40 to 60 percent above committed rates for sustained workloads. On-demand is appropriate for experimentation, low-volume production workloads, and use cases with highly unpredictable traffic patterns.
Provisioned Throughput
Provisioned Throughput reserves dedicated model capacity measured in Model Units — each unit provides a guaranteed level of tokens-per-minute throughput. Pricing is per Model Unit per hour, with 1-month and 6-month commitment terms. The per-token economics of Provisioned Throughput are significantly lower than on-demand — typically 40 to 60 percent savings at sustained utilisation. However, the commitment is to capacity, not consumption — if your throughput requirements drop, you pay for unused capacity. The critical calculation is the break-even utilisation — the percentage of provisioned capacity that must be consumed for Provisioned Throughput to be cheaper than on-demand. For most model configurations, break-even is approximately 30 to 40 percent utilisation.
Batch Inference
Bedrock Batch Inference processes large volumes of prompts asynchronously at a 50 percent discount from on-demand pricing. Batch jobs don't require real-time response — they process prompts in queue and return results when complete. For use cases that tolerate latency such as document summarisation, content generation, and data extraction, batch inference offers the lowest per-token cost without any capacity commitment.
| Bedrock Mode | Pricing Model | Discount vs. On-Demand | Commitment Required | Best For |
|---|---|---|---|---|
| On-Demand | Per input/output token | Baseline (0%) | None | Experimentation, low or unpredictable volume |
| Provisioned (1-month) | Per Model Unit/hour | 40 to 50% | 1-month term | Sustained production with moderate predictability |
| Provisioned (6-month) | Per Model Unit/hour | 50 to 60% | 6-month term | Sustained production with high predictability |
| Batch Inference | Per token (50% of on-demand) | 50% | None | Latency-tolerant, high-volume batch processing |
The Model Version Risk
Provisioned Throughput commits you to a specific model and version. When Anthropic releases Claude 4 or Amazon launches Nova v2, your Provisioned Throughput commitment doesn't automatically upgrade — you're locked to the committed version for the remainder of the term. If the newer model offers better performance at equivalent or lower cost, your commitment represents a premium for an outdated model. This risk doesn't exist with on-demand pricing, where you can switch models instantly. Factor model version risk into any Provisioned Throughput commitment beyond 1 month — and negotiate contractual provisions for model migration or commitment conversion when new versions launch.
Download the complete AWS AI & Bedrock Licensing playbook
Full commitment decision framework, EDP integration strategy, and 5 contractual protections in a structured PDF guide.
The AI-Specific Commitment Decision Framework
The commitment decision for AI services is fundamentally different from traditional compute commitments because AI workloads are inherently less predictable, models evolve rapidly, and the pricing architecture is still immature. The framework below addresses these AI-specific dimensions across four gates.
"The biggest AI cost mistake isn't paying on-demand rates — it's committing to Provisioned Throughput for a model you'll switch away from in four months. In AI, flexibility has more value than discount depth."
Negotiating AI Within Your AWS Enterprise Discount Program
For enterprises with an AWS Enterprise Discount Program (EDP), the most impactful AI cost optimisation lever is negotiating AI-specific terms within the EDP framework. AWS's default is to treat AI spend as generic consumption that counts toward EDP attainment — but doesn't receive AI-specific pricing concessions. Changing this default requires deliberate negotiation at the right moment.
How AI Spend Flows Through EDP
Under standard EDP terms, all Bedrock, SageMaker, and AI service spend counts toward the total EDP commitment — the aggregate annual spend threshold that determines your infrastructure discount tier. This means AI consumption helps you maintain or achieve discount tiers, but the AI services themselves are priced at published on-demand rates. The EDP discount applies to eligible infrastructure services, not to AI service per-unit pricing. This structure creates a cross-subsidy dynamic: AI spend inflates total consumption (supporting higher discount tiers on infrastructure), while AI itself is priced at premium, undiscounted rates. The enterprise perceives the overall deal as competitive because infrastructure discounts are strong — but AI pricing is paying for that perception. This dynamic is central to how AWS manages its commercial relationships across cloud spend.
Four AI-Specific EDP Negotiation Strategies
5 AI-Specific Contract Terms Your Agreement Is Missing
AI commitments require contractual protections that don't exist in standard cloud agreements. The pace of AI model evolution, the immaturity of AI pricing, and the uncertainty of enterprise AI consumption patterns create risks that must be addressed contractually — not left to commercial goodwill. These five protections address risks that are unique to AI spend and that traditional IT contracts are not designed to handle.
You commit to Provisioned Throughput for a specific model version. The model provider releases a superior version and deprecates the old one. Your commitment is stranded on a deprecated model — or you must purchase new Provisioned Throughput for the updated version at additional cost, effectively paying twice.
The ProtectionNegotiate a model migration clause: when a committed model version is deprecated or superseded, the Provisioned Throughput commitment automatically converts to the successor model at equivalent or better pricing. If no direct successor exists, the commitment converts to flexible credit applicable to any Bedrock model.
Model providers adjust pricing when releasing new versions — newer, better models may cost more per token. If your on-demand AI spend is uncapped, a model version upgrade could increase your costs significantly without changing your consumption volume. AWS passes through provider pricing changes without notification.
The ProtectionNegotiate price escalation caps on AI services: maximum annual increase limits for per-token on-demand inference rates, and price lock provisions for Provisioned Throughput committed models for the duration of the commitment term. Include most-favoured-nation clauses ensuring your rates remain competitive with comparable-scale customers.
Provisioned Throughput is locked to a specific model. If you discover that a different model is more effective for your use case, you can't transfer the commitment. AI workloads frequently shift between models as teams optimise for cost, latency, and quality trade-offs.
The ProtectionNegotiate cross-model flexibility provisions: the ability to reallocate Provisioned Throughput commitment across different Bedrock models with reasonable notice (30 to 60 days). Alternatively, negotiate dollar-based AI commitments rather than model-specific capacity — committing to a monthly spend amount in AI rather than a specific Model Unit count.
Provisioned Throughput guarantees capacity (tokens per minute) but doesn't guarantee consistent latency or availability. During high-demand periods, even provisioned capacity can experience degraded performance. For production AI applications with user-facing latency requirements, performance degradation is a service failure.
The ProtectionNegotiate AI-specific SLAs that cover both availability (uptime percentage) and performance (p99 latency targets, throughput consistency). Include service credit provisions for SLA breaches that are proportionate to the committed spend — not the minimal credits in AWS's standard service-level agreements.
AI use cases evolve or are abandoned. A Provisioned Throughput commitment for a use case that is scaled back or cancelled becomes stranded capacity with no exit mechanism. Unlike EC2 Reserved Instances, there is no marketplace for selling unused Bedrock commitments.
The ProtectionNegotiate commitment downgrade provisions: the ability to reduce Provisioned Throughput commitment by up to 30 to 50 percent with 60 days notice. Include early termination provisions with defined break fees for commitments longer than 3 months. The flexibility to exit is worth a modest premium on the committed rate.
6-Phase Framework for Negotiating AWS AI Pricing
Negotiating AWS AI pricing requires a different approach than negotiating infrastructure — because the pricing is less mature, the competitive landscape is more fluid, and the enterprise's own consumption patterns are less established. This framework accounts for AI-specific dynamics in a logical sequence that maximises leverage.
7 Priority Actions for AWS AI Cost Governance
Resist AWS's push for early AI commitments. Stay on-demand during experimentation and early production. The flexibility premium is justified by the value of being able to change models, scale down, or pivot approaches. Only evaluate commitment economics once you have 90 or more days of stable, measurable production consumption.
Batch Inference at 50 percent of on-demand pricing with zero commitment outperforms Provisioned Throughput for any workload that tolerates latency. Audit every AI workload for batch eligibility before sizing Provisioned Throughput commitments. The most common mistake is committing Provisioned Throughput for workloads that should run as batch jobs.
Leverage model portability — Claude on Bedrock vs. Vertex AI, Llama on Bedrock vs. Azure vs. GCP — to produce genuine pricing comparisons. This data eliminates the information asymmetry that gives AWS pricing power. Even if you intend to stay on AWS, the benchmark data transforms your negotiation position.
Don't negotiate AI pricing in isolation. Embed AI-specific discount tiers, consumption credits, and Provisioned Throughput integration in the EDP framework. The EDP renewal is when AWS's account team has maximum incentive to accommodate AI-specific requests — because they're protecting the total relationship value.
Negotiate contractual provisions for model deprecation migration, cross-model commitment flexibility, and price protection against version changes. These AI-specific protections address risks that don't exist in infrastructure commitments and are essential as AI spend scales from experimental to material line items in your IT budget.
Until AI consumption patterns stabilise and model selection solidifies, limit Provisioned Throughput commitments to 1-month terms — even though 6-month terms offer deeper discounts. The model version risk and use case evolution risk outweigh the incremental discount for most enterprises in the current phase of AI adoption. Extend commitment terms only as confidence in consumption stability increases.
Invest in AI-specific cost governance before AI spend becomes a major line item. Track cost-per-inference by model and use case, compare model economics for equivalent tasks, monitor prompt efficiency (tokens per output unit), and evaluate model routing strategies. The enterprises that build AI FinOps now will negotiate from data-driven positions at every future renewal. Those that don't will be price takers. For hands-on support designing your AI cost governance framework, speak with our Cloud & AI team.
Independent AWS AI Advisory: Zero Vendor Affiliations
Redress Compliance's Cloud & AI Practice provides independent advisory on AWS AI service pricing and commitment strategy. We maintain zero commercial relationships with AWS, any model provider, or any FinOps tooling vendor. Our advisors have been on both sides of these negotiations and understand AWS's internal pricing architecture and commercial playbooks.
Cross-provider pricing benchmarks for identical models on AWS Bedrock, Azure OpenAI, and Google Vertex AI — producing the competitive data that transforms your negotiation position.
Data-driven Provisioned Throughput and SageMaker Savings Plan sizing — workload segmentation, break-even analysis, model stability assessment, and layered commitment portfolio design.
Integration of AI-specific pricing provisions into EDP renewal negotiations — discount tiers, consumption credits, Provisioned Throughput coordination, and infrastructure tier protection.
AI-specific contractual protection negotiation — model migration clauses, price escalation caps, cross-model flexibility, SLA guarantees, and exit provisions.
Design and implementation of AI-specific cost governance — cost-per-inference tracking, model economics comparison, prompt efficiency monitoring, and commitment portfolio management.
Structured evaluation of AI services across AWS, Azure, and GCP — producing the competitive intelligence that informs both provider selection and pricing negotiation.
Get AWS and Cloud Licensing Intelligence Direct
Fortnightly insights on AWS commercial strategy, Bedrock pricing developments, and enterprise cloud cost governance. No spam. Unsubscribe anytime.
Facing an AWS AI Renewal or Commitment Decision?
Our Cloud & AI advisors provide independent analysis of your AWS AI spend, commitment economics, and EDP structure. We sit exclusively on the buyer's side.