The Core Insight

Most enterprises negotiate their AWS Private Pricing Agreement (PPA, formerly EDP) as an infrastructure deal, then layer AI workloads on top and hope the blanket discount covers them. It usually does — technically. But the blanket discount was sized for EC2, S3, and RDS spend patterns, not for Bedrock token consumption that can spike 5× in a single month when a new agent goes into production. The result is either a commitment shortfall (you committed to less than you end up spending on AI and miss better discount tiers) or a commitment overshoot (you over-committed based on AI growth projections that did not materialise). Either way, you lose. This playbook teaches you to negotiate AI spend as a distinct workload class within your broader AWS relationship.

1. Why AWS AI Spend Is Structurally Different

Traditional AWS infrastructure spend is relatively predictable: EC2 instances run on defined schedules, S3 storage grows linearly, RDS databases have stable query patterns. You can forecast next quarter’s bill within 10–15% based on this quarter’s usage. AI spend follows none of these patterns.

Bedrock token consumption is driven by user adoption rates, prompt complexity, model selection, and application design — all of which change rapidly. A single architectural decision (switching from Claude Haiku to Claude Sonnet for a customer-facing chatbot) can increase costs by 3–5× overnight. Deploying a Bedrock Agent that makes five internal model calls per user query creates a 5× cost multiplier that does not appear in the agent’s pricing documentation. Enabling retrieval-augmented generation (RAG) adds embedding costs, knowledge base storage, and retrieval costs on top of the inference costs.

SageMaker spend introduces GPU instance costs that dwarf typical EC2 bills. A single ml.p5.48xlarge instance for fine-tuning or hosting a large model costs over $100 per hour. A team that leaves a training job running over a weekend can generate a five-figure bill before anyone notices.

The volatility and magnitude of AI spend require a different negotiation approach: one that builds in flexibility, caps downside risk, and creates mechanisms for capturing the upside of rapid growth without locking you into commitments you cannot meet.

2. The AWS AI Cost Landscape

Amazon Bedrock

Bedrock is AWS’s managed foundation model service, providing access to models from Anthropic (Claude), Meta (Llama), Mistral, Cohere, Amazon (Nova, Titan), and others. Costs come in multiple layers:

On-demand inference: Pay-per-token for each API call. Rates vary dramatically by model: Claude Sonnet 4.5 costs $3/$15 per million input/output tokens, while Amazon Nova Micro costs a fraction of that. Output tokens are typically 3–5× more expensive than input tokens.

Provisioned Throughput: Reserved capacity charged per model unit per hour, starting at approximately $15,000/month minimum for enterprise-grade models. Commitments of 1 or 6 months reduce the hourly rate by up to 50%.

Batch inference: Asynchronous processing at approximately 50% of on-demand rates, available for select models including Anthropic Claude, Meta Llama, and Amazon Nova.

Bedrock Agents, Knowledge Bases, and Guardrails: Additional cost layers for agent orchestration, RAG infrastructure, and content filtering that sit on top of inference costs.

Amazon SageMaker

SageMaker is AWS’s platform for building, training, and deploying machine learning models. For AI specifically, the relevant cost components are: GPU instance costs for model training and fine-tuning (ml.p4d, ml.p5 instances), real-time inference endpoint hosting, and SageMaker JumpStart for deploying open-source foundation models. SageMaker costs are primarily instance-based rather than token-based, making them more predictable but also more expensive for sustained workloads.

Supporting Services

Every AI workload generates costs in supporting AWS services: S3 for data storage and model artifacts, CloudWatch for monitoring and logging, OpenSearch or Kendra for RAG knowledge bases, Lambda and Step Functions for orchestration, and data transfer between services and regions. These supporting costs typically add 15–30% on top of the core inference or training costs and are frequently overlooked in budgeting.

3. Negotiating Amazon Bedrock

Understand the Model Cost Hierarchy

Not all Bedrock models are priced equally, and the cost differential is enormous. For text generation alone, the spread from Amazon Nova Micro (fractions of a cent per thousand tokens) to Claude Sonnet 4.5 ($3/$15 per million tokens) represents a cost variation of over 100×. Your first negotiation lever is not with AWS — it is with your own engineering teams. Mandate model selection policies that default to the cheapest model capable of each task and require explicit justification for premium models.

Use Intelligent Prompt Routing

Bedrock’s Intelligent Prompt Routing feature automatically routes requests between models in the same family based on complexity (e.g., Claude Haiku for simple queries, Claude Sonnet for complex ones). AWS reports this can reduce costs by up to 30% without sacrificing quality. Enable this for every production workload and factor the expected savings into your Bedrock cost projections before committing to spend levels.

Negotiate Provisioned Throughput Terms

For production workloads with predictable traffic, Provisioned Throughput can be more cost-effective than on-demand pricing. However, the commitment is significant (minimum $15,000/month for enterprise models). Negotiate the following: a no-commitment trial period (2–4 weeks) before locking into a monthly or 6-month commitment; the ability to scale model units up or down within the commitment term; and committed pricing that reflects your total Bedrock relationship, not just the provisioned component.

Maximise Batch Processing

Batch inference at approximately 50% of on-demand rates is the single most impactful cost lever for non-real-time workloads. Review every Bedrock workload and classify it as latency-sensitive (must be on-demand) or latency-tolerant (can be batched). Common batch-eligible workloads include document summarisation, data extraction, content classification, and embedding generation. Enterprises that rigorously classify and route workloads typically shift 30–50% of total token volume to batch processing, delivering 15–25% overall Bedrock cost reduction with no quality impact.

Negotiate Cross-Model Volume Discounts

If your total monthly Bedrock consumption exceeds $10,000, request a volume discount that applies across all models and pricing tiers. AWS does not offer this proactively, but enterprise customers with significant Bedrock spend can negotiate 5–15% additional discounts on top of the blanket PPA/EDP discount. Frame this as a Bedrock-specific commitment: “We commit to $X/month in Bedrock consumption in exchange for an additional Y% discount on all Bedrock inference.”

4. Negotiating SageMaker AI

Control GPU Instance Costs

SageMaker GPU instances are the highest per-hour cost items in most AWS AI budgets. A single ml.p5.48xlarge costs over $100/hour; a 10-instance training cluster running for 48 hours generates a $48,000+ bill. Negotiation levers include: SageMaker Savings Plans (1-year or 3-year commitments at up to 64% discount on compute), spot instances for fault-tolerant training jobs (60–90% discount with interruption risk), and Reserved Instances for long-running inference endpoints.

Negotiate SageMaker Savings Plans

SageMaker Savings Plans commit to a consistent amount of compute usage (measured in $/hour) in exchange for discounted rates. These plans cover SageMaker ML instances across Studio notebooks, training, inference, and transform. The commitment is to an hourly spend rate, not to specific instance types, giving you flexibility to change instance families as model requirements evolve. For enterprises with steady-state SageMaker workloads, a 1-year Savings Plan at the right commitment level can reduce SageMaker compute costs by 20–30%.

Separate Training From Inference in Your Budget

Training is bursty and unpredictable; inference is (relatively) steady-state. Negotiate them as separate cost classes: use spot instances and short-term commitments for training, and Savings Plans or Reserved Instances for inference endpoints. This prevents the volatility of training costs from distorting your committed-spend forecasts.

Consider Bedrock vs. SageMaker Trade-Offs

For models available on both Bedrock (managed inference) and SageMaker (self-hosted inference), the cost trade-off depends on scale. Bedrock is cheaper for low-to-moderate volumes because you pay only per token with no idle capacity. SageMaker self-hosting becomes cheaper at high volumes because a dedicated endpoint has a fixed hourly cost regardless of throughput. During negotiations, model both scenarios for your projected workloads and choose the channel that minimises total cost at your expected volume.

5. Integrating AI Into Your PPA/EDP

How the PPA/EDP Works

The AWS Private Pricing Agreement (PPA, formerly Enterprise Discount Program/EDP) is a committed-spend agreement where you pledge a minimum annual spend in exchange for a percentage discount across most AWS services. Typical baseline discounts start at 6–9% for a $1 million annual commitment and scale up with higher commitments and longer terms. Three-year agreements commonly secure approximately 15% discounts, compared to roughly 10% for one-year deals at the same spend level.

Key structural points: your commitment is measured on net spend (after Savings Plan and Reserved Instance discounts, not gross on-demand). AWS Marketplace purchases can offset up to 25% of your commitment (negotiable to 30–35%). You cannot reduce your annual commitment below the previous year’s level. And participation requires AWS Enterprise Support (3–10% of monthly usage), which itself is a significant cost that must be factored into the total economics.

The AI Integration Challenge

The fundamental challenge is that AI spend is volatile and hard to forecast, while PPA/EDP commitments are fixed and inflexible. If you include aggressive AI growth projections in your PPA commitment and adoption ramps slower than expected, you face a shortfall payment at year-end. If you exclude AI from your commitment to play it safe, you miss the opportunity to aggregate AI spend with infrastructure spend for a higher discount tier.

The Recommended Structure

Structure your PPA with a base commitment built on your proven infrastructure spend (EC2, S3, RDS, networking — the spend you can forecast with confidence) plus a separate AI-specific addendum that covers Bedrock, SageMaker, and related services. The base commitment qualifies for the standard PPA discount. The AI addendum can be structured as: (1) a lower, conservative commitment for the first year with scheduled step-ups tied to demonstrated adoption, (2) a “best efforts” growth target for AI spend that, if met, triggers an additional discount increment, or (3) a combined commitment that blends infrastructure and AI spend but with a contractual true-down provision allowing you to reduce the AI component by up to 20% if adoption underperforms.

Negotiate the Shortfall Mechanics

Standard PPA/EDP terms charge you for any unmet commitment at the end of the annual period. For AI-inclusive commitments, negotiate softer shortfall provisions: a 90-day grace period to make up the shortfall through accelerated adoption, the ability to carry forward a shortfall of up to 15% into the next year’s commitment without penalty, or quarterly true-ups instead of annual ones so you can adjust course before a shortfall becomes material.

Leverage AWS Marketplace for AI SaaS

AWS Marketplace purchases count towards your PPA commitment (up to 25%, negotiable higher). If your organisation uses AI SaaS tools available on the Marketplace — such as Anthropic’s direct offerings, Datadog for AI monitoring, Snowflake for data infrastructure, or specialised AI tools — route these purchases through the Marketplace to accelerate commitment drawdown. This is particularly useful in the first year of an AI ramp when direct Bedrock/SageMaker consumption may be lower than projected.

6. Architecture Decisions That Are Really Pricing Decisions

Many of the most impactful cost decisions in AWS AI are made by engineering teams, not procurement. Ensure your negotiation strategy includes governance over these architectural choices:

Model selection policy: Require that every production workload uses the cheapest model that meets quality requirements. Create an approved model catalogue with cost ratings and mandate justification for premium model selection. The difference between defaulting to Claude Sonnet vs. Amazon Nova Lite for a classification task can be 20–50× in cost.

Agent call depth: Bedrock Agents make multiple internal API calls per user query, and you pay for every call. An agent that chains five model calls per request creates a 5× cost multiplier. Set maximum call-depth limits in your Bedrock Agent configurations and monitor actual call patterns in production.

Prompt engineering as cost control: Verbose system prompts, excessive few-shot examples, and unoptimised context windows directly inflate input token costs. A 10,000-token system prompt on every request costs 10× what a 1,000-token prompt costs. Invest in prompt optimisation before scaling any production workload.

RAG retrieval tuning: Knowledge Base configurations that retrieve too many chunks per query inflate both the retrieval cost and the input token cost (because all retrieved chunks are sent to the model). Tune your retrieval parameters to return the minimum number of chunks that maintain answer quality.

Region selection: Bedrock pricing varies by AWS region. Cross-region data transfer between your data sources and your Bedrock inference region adds network transfer costs. Co-locate your Bedrock workloads with your data sources whenever possible, and verify that the region you select offers competitive pricing for the models you need.

Need Expert GenAI Advisory?

Redress Compliance provides independent GenAI licensing advisory — fixed-fee, no vendor affiliations.

Explore GenAI Advisory Services →

7. Hidden Costs That Inflate Your AI Bill

Enterprise Support Fees

PPA/EDP participation requires AWS Enterprise Support, which costs 3–10% of your monthly spend (on a sliding scale). As your AI spend grows, so does your support fee — automatically. A $200,000/year Bedrock bill can generate $6,000–$20,000 in additional support costs that are rarely included in AI business cases. Negotiate support fee caps or consider whether a lower-tier support plan with third-party support augmentation might be sufficient.

Data Transfer

AI workloads frequently involve large data movements: training data to SageMaker, documents to Bedrock Knowledge Bases, embeddings to and from vector stores. Cross-region and internet-egress data transfer charges accumulate quickly. Budget for data transfer at 5–10% of your core AI compute costs as a baseline estimate.

Logging and Monitoring

CloudWatch logs for Bedrock API calls, SageMaker training metrics, and custom dashboards generate their own costs. At high volumes, CloudWatch Logs Insights queries against Bedrock invocation logs can cost more than expected. Set log retention policies and sampling rates from the outset.

Knowledge Base Infrastructure

Bedrock Knowledge Bases use OpenSearch Serverless or other vector stores, which have their own pricing (OpenSearch Compute Units). A knowledge base that seems “free” with Bedrock actually generates a separate OpenSearch bill that can reach $1,000–$5,000/month for enterprise-scale collections.

Idle Provisioned Throughput

Provisioned Throughput model units are charged 24/7 whether you use them or not. Over-provisioning or failing to decommission units after a project ends creates pure waste. Implement automated monitoring for provisioned throughput utilisation and decommission any units running below 30% utilisation.

8. Critical Contract Terms

Price protection: Lock your Bedrock per-token rates and SageMaker instance pricing for the PPA term. Without this, AWS can change published pricing mid-contract. Frame this as essential for budget predictability.

Model deprecation notice: Require at least 6–12 months’ written notice before AWS deprecates or removes access to any Bedrock model you depend on, with a migration path to an equivalent model at equal or better pricing.

Service-specific SLA: Ensure Bedrock and SageMaker are explicitly covered by uptime SLAs with financial remedies (service credits). Verify that the SLA covers inference availability, not just the control plane.

Data governance: Confirm that your data is not used for model training by any Bedrock model provider. This is the default for most enterprise-grade models on Bedrock, but it should be explicitly stated in your agreement. For regulated industries, negotiate a data processing addendum with specific retention and deletion commitments.

Commitment flexibility: Include a true-down provision that allows you to reduce your AI-specific commitment by 10–20% if adoption underperforms, without affecting the discount on your base infrastructure commitment. This is non-standard but increasingly achievable for large enterprise customers.

Marketplace inclusion threshold: If you use AI-related SaaS via AWS Marketplace, negotiate to increase the Marketplace offset from the standard 25% to 30–35% of your committed spend.

9. Using Competitive Leverage

AWS AI negotiation is more responsive to competitive pressure than almost any other AWS service because the AI market is genuinely multi-vendor AI strategy. Unlike EC2 (where migration is painful and alternatives are limited), Bedrock workloads can often be ported to Azure OpenAI Service, Google Vertex AI, or Anthropic’s direct API with moderate engineering effort. Use this explicitly.

Multi-model providers on Bedrock are your leverage: The models available on Bedrock (Claude, Llama, Mistral, Nova) are also available through other channels, often at the same or lower pricing. Anthropic’s Claude is available directly, via Google Vertex AI, and via Microsoft Foundry. Meta’s Llama can be self-hosted on any cloud. AWS knows this, which is why Bedrock pricing is typically more negotiable than EC2 or S3 pricing.

Run a parallel evaluation: Before your PPA renewal or initial negotiation, deploy a proof-of-concept on at least one competing platform (Azure OpenAI or Google Vertex AI). Document the results, including pricing, performance, and migration feasibility. Share this evaluation with your AWS account team. Enterprises with demonstrated multi-cloud AI capability consistently achieve 10–20% better Bedrock pricing than those negotiating with AWS alone.

📊 Free Assessment Tool

How competitive is your AWS AI spend? Our free benchmarking assessment compares your Bedrock/SageMaker costs.

Take the Free Assessment →

Leverage direct API pricing: For Anthropic Claude specifically, compare Bedrock pricing to Anthropic’s direct API pricing. If the direct API is cheaper for your volume (which it can be, especially for batch workloads), use this as a negotiation point with AWS. AWS adds a margin on third-party model pricing on Bedrock; demonstrating that you can bypass this margin forces AWS to compete on value-added services (security, governance, consolidated billing) rather than raw model pricing.

10. The Eight Most Expensive Mistakes

1. Forecasting AI Spend Using Infrastructure Assumptions

AI consumption is 3–5× more volatile than traditional cloud infrastructure. Applying your standard 10–15% growth assumptions to AI workloads leads to either massive shortfalls or stranded commitments. Build AI forecasts separately using adoption curve modelling, not linear growth projections.

2. Defaulting to Premium Models

When engineering teams choose Claude Sonnet for every task because “it’s the best,” you pay 20–100× more than needed for tasks that Nova Lite or Haiku could handle equally well. Implement model selection governance before scaling any AI deployment.

3. Ignoring Agent Call Multiplication

A single Bedrock Agent query can trigger 3–10 internal model calls. If your cost model assumes one API call per user interaction, your actual bill will be 3–10× your projection. Instrument and monitor agent call depth from day one.

4. Not Using Batch Processing

Every token processed in real-time that could have been batched costs 2× what it should. Classify all workloads by latency requirement and route everything that can tolerate asynchronous processing through Bedrock Batch at 50% discount.

5. Including Aggressive AI Growth in Your PPA Commitment

Accepting AWS’s 20%+ annual growth projections for AI and baking them into your committed spend creates a shortfall trap. Commit conservatively and negotiate options to step up if growth materialises.

6. Forgetting Enterprise Support Scales With AI Spend

Your Enterprise Support fee (3–10% of spend) grows automatically as AI consumption grows. A $500,000 increase in Bedrock spend creates up to $50,000 in additional support fees. Negotiate support fee caps or carve-outs for AI-specific spend.

7. Over-Provisioning Throughput

Provisioned Throughput model units charge 24/7. Provisioning for peak capacity and running at 30% average utilisation means 70% of your provisioned spend is waste. Right-size units, use auto-scaling where available, and decommission units promptly when projects end.

8. Negotiating Without Multi-Vendor Leverage

Bedrock models are available on multiple platforms. Negotiating as if AWS is your only option leaves 10–20% on the table. Maintain active alternatives on at least one competing platform.

11. FAQ

Does my PPA/EDP discount apply to Bedrock?

Yes. Bedrock on-demand inference and Provisioned Throughput charges are generally eligible for your PPA/EDP blanket discount. However, the discount applies on top of the already-published per-token or per-hour rate. Verify with your AWS account team which specific Bedrock services and model providers are included in your PPA scope, as some newer models or features may be excluded.

What discount can I expect on Bedrock specifically?

The PPA blanket discount (typically 6–15% depending on total commitment) applies to Bedrock. On top of this, enterprises with $10,000+/month Bedrock consumption can negotiate an additional 5–15% Bedrock-specific discount. Combined, total discounts of 15–25% off published Bedrock rates are achievable for large-volume customers.

Should I use Bedrock or SageMaker for model inference?

Bedrock for low-to-moderate volumes (pay-per-token with no idle cost) and models you do not need to customise. SageMaker for high-volume inference (fixed endpoint cost regardless of throughput), custom models, or fine-tuned models. Model both scenarios at your projected volume to determine the crossover point. Many enterprises use both: Bedrock for multi-model access and SageMaker for their most heavily used model at scale.

How do I forecast AI spend for PPA negotiations?

Start with a pilot phase (3–6 months) to establish baseline consumption per user and per use case. Multiply by your adoption curve (not 100% from day one). Add 15–30% for supporting services (storage, monitoring, data transfer, knowledge base infrastructure). Use conservative growth assumptions (10–15% quarterly, not 20%+ annual that AWS may project). Build the PPA commitment on the lower bound of your forecast with contractual options to step up.

Can I use AWS credits for Bedrock consumption?

Yes. AWS credits (including Migration Acceleration Program credits and negotiated PPA credits) can typically be applied to Bedrock consumption. Negotiate credits that specifically cover AI services and confirm they have a 12-month validity period. Short-validity credits (90 days) for a service you are still ramping are worth less than they appear.

What happens if I exceed my PPA commitment?

Exceeding your commitment is not penalised — you simply pay on-demand rates (less your PPA discount) for the excess. However, exceeding your commitment means you could have negotiated a higher commitment level and captured a higher discount tier from day one. If you consistently exceed your commitment, renegotiate mid-term or at renewal to capture the higher discount.

Where can I get independent help with AWS AI negotiations?

Redress Compliance provides independent independent GenAI advisory services on AWS AI contract negotiations, covering Bedrock pricing, SageMaker economics, PPA/EDP structuring, and multi-vendor competitive strategy. We help enterprises avoid the specific traps described in this guide and negotiate contracts that reflect the unique economics of AI workloads, not infrastructure assumptions. Learn more about our GenAI licensing knowledge hub Negotiation Services →