Azure OpenAI Service Commitment: When PTU Pays and When It Does Not
At June 2026 list, a Global PTU annual reservation undercuts Pay As You Go only above roughly 85 percent sustained utilization on GPT 5. Most enterprise estates we benchmark sit below 50 percent.
Prepared by Redress Compliance · June 2026 · Representative Azure OpenAI estate scenario (benchmark scenario, not a quote)
Executive Summary
The Azure OpenAI commitment decision is two decisions disguised as one. The first is structural: Pay As You Go token consumption or a Provisioned Throughput Unit (PTU) reservation; the second is contractual: which of eight protective clauses you secure before the commitment signs. Most buyers get sold the first and never raise the second.
The structural math has moved against PTU as a savings story. Global PTU lists at $1 per PTU per hour, with the annual reservation cutting that to $221 per PTU per month, a 70 percent discount. Yet $260 of Pay As You Go buys a full GPT 5 PTU month, so the annual reservation breaks even above 85 percent utilization.
Meanwhile the flagship input token list price fell from $30 to $1.25 per million tokens between March 2023 and August 2025, a 96 percent decline. Locking reserved capacity against a falling unit price is a bet that needs contract protection: a price ceiling, model substitution rights, and throughput protection when newer models consume more PTUs per token.
This paper prices the decision on a representative estate, then documents the eight clauses we negotiate: PTU grandfather, model price ceiling, substitution rights, regional capacity protection, indemnity assignment, deprecation notice, data residency, and executive escalation. These moves landed 15 to 25 percent below the opening Microsoft proposal in the Azure OpenAI commitments we benchmarked in 2024 to 2025.
Why the Azure OpenAI Commitment Sits Inside the Microsoft Framework
Azure OpenAI is rarely bought alone. It sits inside the Azure consumption commitment (MACC) and often inside an Enterprise Agreement that also carries Microsoft 365 and Copilot. That placement is deliberate: both Pay As You Go tokens and PTU reservations decrement the MACC, so every Azure OpenAI dollar accelerates your committed cloud burn.
This is why the account team pushes PTU conversion even when the workload math does not support it. A reservation pulls twelve months of AI spend into the current MACC period, helps the commitment retire on schedule, and anchors the next, larger MACC renewal.
The buyer side move is separation. Negotiate the Azure OpenAI commitment as its own conversation with its own utilization evidence, then decide how it nets against the MACC. Customers who let the two blend accept commitment sizes the standalone numbers would have rejected.
The Commercial Model Decoded
Azure OpenAI prices through two mechanics. Pay As You Go bills per million input and output tokens, with the catalog spanning the GPT 5 family, GPT 4.1, the o series reasoning models, and embedding, image, and audio models. A Batch tier prices asynchronous workloads at roughly half the interactive rate, and cached input tokens bill at a 90 percent discount.
Provisioned Throughput Units reserve model processing capacity at a fixed hourly rate per PTU, regardless of usage. PTUs are billed hourly, with one month and one year Azure reservations providing the term discount. Three deployment types carry different rates and data boundaries.
| Deployment type | Data boundary | Rate posture | GPT 5 minimum |
|---|---|---|---|
| Global Provisioned | Routed across the worldwide Microsoft fleet | Lowest list, $1 per PTU per hour | 15 PTU, increments of 5 |
| Data Zone Provisioned | Confined to the EU or US data zone | Roughly a 10 percent premium over Global | 15 PTU, increments of 5 |
| Regional Provisioned | Pinned to one named Azure region | Roughly double the Global rate | 50 PTU, increments of 50 |
Throughput per PTU is set per model. GPT 5 delivers 4,750 input tokens per minute per PTU, and one output token counts as eight input tokens against utilization, matching the 8 to 1 price ratio. Cached tokens are deducted 100 percent from PTU utilization, so aggressive prompt caching effectively expands reserved capacity for free.
| Model | Input TPM per PTU | Output token weight | Latency target |
|---|---|---|---|
| GPT 5 | 4,750 | 8 input tokens | 99% over 50 tokens per second |
| GPT 5.4 | 2,400 | 8 input tokens | 99% over 50 tokens per second |
| GPT 5.5 | 1,200 | 8 input tokens | 99% over 100 tokens per second |
| GPT 4.1 | 3,000 | 4 input tokens | 99% over 80 tokens per second |
Read that table twice: PTU billing is model independent, but PTU capacity is not. The same reservation that runs GPT 5 at 4,750 tokens per minute per PTU runs GPT 5.5 at 1,200, so a model upgrade can cut delivered throughput by three quarters at the same price. Section 4 turns that mechanic into contract language.
The PTU Versus Pay As You Go Decision
Start from one anchor: at GPT 5 list of $1.25 per million input tokens and $10 per million output tokens, Pay As You Go costs $1.25 per million utilization weighted tokens regardless of mix. One fully utilized PTU processes about 208 million such tokens per month. So $260 of Pay As You Go buys a full PTU month.
Against that, the three PTU billing modes price as follows.
The monthly reservation at $260 exactly equals the Pay As You Go value of a fully utilized PTU, so it breaks even only at 100 percent utilization. The annual reservation at $221 breaks even at roughly 85 percent. Below those lines Pay As You Go is cheaper; above them the reservation wins and adds the latency floor.
A worked estate, priced four ways
Take a representative insurance carrier running GPT 5 document workloads: 1,200 million input tokens and 150 million output tokens per month, peaking at twice the average rate. Utilization weighted demand is 2,400 million tokens. Sizing for peak requires 25 Global PTUs, which run at 46 percent average utilization (benchmark scenario, not a quote).
| Option | Unit rate | Monthly cost | Versus Pay As You Go |
|---|---|---|---|
| Pay As You Go | $1.25 in / $10.00 out per 1M tokens | $3,000 | Baseline |
| 25 PTU, hourly | $730 per PTU per month | $18,250 | 6.1x the cost |
| 25 PTU, monthly reservation | $260 per PTU per month | $6,500 | 2.2x the cost |
| 25 PTU, annual reservation | $221 per PTU per month | $5,525 | 1.8x the cost |
The right structure for most estates is a commitment ladder: an annual reservation sized to the measured utilization floor (the demand present in 85 percent or more of hours), a monthly reservation tranche for seasonal steps, and Pay As You Go spillover above the reservation, which Azure bills automatically at the hourly rate when deployments exceed reserved PTUs.
Across the Azure OpenAI commitments we supported in 2024 to 2025, sequencing utilization evidence, deployment type mix, and the clause set landed 15 to 25 percent below Microsoft's opening structure.
In the estates we audited, 30 to 45 percent of deployed PTU capacity ran idle outside business hours because reservations were sized to peak instead of floor. Benchmark ranges: Redress Compliance advisory engagement file, 2024 to 2025.
Model Price Evolution Defense
Every Azure OpenAI commitment is a bet against the price curve. The flagship input token list has fallen relentlessly: GPT 4 launched at $30 per million input tokens in March 2023, GPT 4 Turbo cut that to $10, GPT 4o to $5, the August 2024 GPT 4o update to $2.50, and GPT 5 to $1.25 in August 2025.
Three clauses convert that curve from risk into protection. First, a model price ceiling: negotiated token rates expressed as a discount off the then current public list, so every public price cut flows through automatically. A fixed dollar rate in a falling market is a concession to Microsoft.
Second, model substitution rights with throughput protection. Because tokens per PTU vary by model, a substitution clause must hold the delivered tokens per minute, not just the PTU count. Moving a 25 PTU GPT 5 deployment to GPT 5.5 without that language cuts delivered throughput by roughly three quarters at identical cost.
Third, deprecation and retirement notice. Microsoft's published lifecycle policy guarantees generally available models for at least 12 months and commits to at least 60 days notice before retirement. For a regulated workload, negotiate 90 to 180 days in the agreement, plus funded migration assistance when retirement forces a model move inside the reservation term.
Regional Capacity Strategy
The least understood sentence in the PTU documentation is this one: quota does not guarantee capacity, and neither does a reservation. PTU quota caps what you may deploy in a region; the reservation guarantees a discounted price. Neither guarantees that GPT 5 capacity exists in your region on the day you scale.
Microsoft's own guidance is to deploy first, then buy the reservation, precisely because customers have purchased reservations against capacity that was not available. Cancellation credits on reservations are limited, and reservations for Global, Data Zone, and Regional deployment types are not interchangeable with each other.
The buyer side procedure is a regional capacity audit before any commitment signs:
| Audit step | What it establishes | Contract consequence |
|---|---|---|
| Map data residency duties | Which workloads genuinely require Data Zone or Regional, versus default Global | Pushes spend to the cheapest compliant deployment type |
| Test deploy in each target region | Whether capacity actually exists today for your models | Deployment confirmed before any reservation is purchased |
| Confirm quota headroom | That growth will not stall against the regional quota cap | Quota uplift commitment documented in the agreement |
| Scope the reservation | Subscription and resource group scope match the deployments | Discount applies; excess PTUs do not silently bill hourly |
Then negotiate capacity language: a documented capacity escalation path with named engineering contacts, priority access for committed customers during regional constraint, and the right to rescope the reservation to another region without penalty if Microsoft cannot provision within 30 days.
Microsoft Indemnity for Output
The strongest commercial argument for Azure OpenAI over a direct OpenAI contract is the Customer Copyright Commitment in the Microsoft Product Terms. Microsoft commits to defend customers against third party copyright claims over output content and to pay resulting judgments or settlements.
The protection is conditional, and the conditions are where coverage quietly dies. To be covered for Azure OpenAI output, the customer must have implemented all required mitigations documented by Microsoft, including the content filters and safety system configuration. An engineering team that relaxes content filters for throughput can void the indemnity for the whole workload.
The carve outs matter as much. The commitment does not cover claims arising from your input data, your fine tuning training sets, or material you supply through retrieval augmented generation. Your data pipeline remains your liability.
Microsoft Copilot and Azure OpenAI Integration
Estates that run both Microsoft 365 Copilot and Azure OpenAI face a deliberate blending play. Copilot lists at $30 per user per month, $360 per user per year. The account team will present Copilot seats and the Azure OpenAI commitment as one AI envelope, sized together, discounted together.
Keep them apart, because the economics are different in kind. Copilot is a per seat product whose value depends on adoption per user. Azure OpenAI is metered infrastructure whose value depends on workload engineering.
That is not an argument against Copilot, which bundles the Microsoft 365 graph integration no custom build matches cheaply. It is an argument for routing each use case to its cheapest adequate vehicle: broad knowledge worker assistance to Copilot seats under an adoption gated ramp, custom and high volume workloads to Azure OpenAI under the commitment ladder.
The Eight Contract Levers
These are the clauses we negotiate into Azure OpenAI commitments, in the order they typically trade.
| Lever | What it secures | Negotiation anchor |
|---|---|---|
| 1. PTU grandfather | Existing reservation rates and throughput tables persist through renewal | Renewal repricing only with 90 days notice and a capped uplift |
| 2. Model price ceiling | Token rates as a discount off public list, so cuts flow through | The 96 percent flagship price decline since 2023 |
| 3. Substitution rights | Move workloads to successor models at equivalent delivered throughput | Tokens per PTU varies 4x across the GPT 5 family |
| 4. Regional capacity protection | Priority provisioning and penalty free reservation rescope | Quota and reservations carry no capacity guarantee |
| 5. Indemnity assignment | Customer Copyright Commitment extends to your distributed outputs | Mitigation configuration acknowledged at signature |
| 6. Deprecation notice | 90 to 180 days notice plus funded migration on forced model moves | Published floor is 60 days for GA models |
| 7. Data residency posture | Data Zone or Regional terms documented per workload class | Pay the premium only where a duty exists |
| 8. Executive escalation | Named escalation path for capacity, pricing, and indemnity events | Quarterly commercial review with utilization data |
Sequence matters as much as substance. The levers price differently at different points in Microsoft's quarter, and utilization evidence is the currency that buys them.
Build the utilization baseline
Pull token telemetry per deployment. Compute sustained utilization against the 85 percent break even line. Classify workloads as latency critical, interactive, or batch.
Design the commitment ladder
Test deploy in target regions, size the annual reservation to the measured floor, route batch work to the Batch tier, and table the eight clause set with Microsoft.
Trade structure for price
Hold the reservation purchase until deployments are confirmed. Trade commitment term and deployment mix against the price ceiling, substitution, and indemnity clauses. Close at quarter end.
Our recommendation: treat PTU as a performance purchase, the clause set as the savings engine, and the reservation signature as the last step, never the first.
- Before any commitment: run the utilization baseline and the regional capacity audit. If sustained utilization is below 85 percent, the annual reservation is costing you money relative to Pay As You Go, however large the quoted discount.
- In the negotiation: secure the price ceiling, substitution rights with throughput protection, and the indemnity assignment before discussing reservation size. Structure concedes cheaply early and expensively late.
Redress Compliance is a 100 percent buyer side advisory firm with 500+ enterprise clients and more than $2B under advisory across 11 vendor practices. If an Azure OpenAI commitment is on your desk, contact us or visit our Microsoft practice before you sign. We are glad to tie a meaningful part of the fee to delivered value.