Microsoft Azure | Azure OpenAI Commitment White Paper

Azure OpenAI Service Commitment: When PTU Pays and When It Does Not

At June 2026 list, a Global PTU annual reservation undercuts Pay As You Go only above roughly 85 percent sustained utilization on GPT 5. Most enterprise estates we benchmark sit below 50 percent.

Prepared by Redress Compliance · June 2026 · Representative Azure OpenAI estate scenario (benchmark scenario, not a quote)

Executive Summary

The Azure OpenAI commitment decision is two decisions disguised as one. The first is structural: Pay As You Go token consumption or a Provisioned Throughput Unit (PTU) reservation; the second is contractual: which of eight protective clauses you secure before the commitment signs. Most buyers get sold the first and never raise the second.

The structural math has moved against PTU as a savings story. Global PTU lists at $1 per PTU per hour, with the annual reservation cutting that to $221 per PTU per month, a 70 percent discount. Yet $260 of Pay As You Go buys a full GPT 5 PTU month, so the annual reservation breaks even above 85 percent utilization.

Meanwhile the flagship input token list price fell from $30 to $1.25 per million tokens between March 2023 and August 2025, a 96 percent decline. Locking reserved capacity against a falling unit price is a bet that needs contract protection: a price ceiling, model substitution rights, and throughput protection when newer models consume more PTUs per token.

This paper prices the decision on a representative estate, then documents the eight clauses we negotiate: PTU grandfather, model price ceiling, substitution rights, regional capacity protection, indemnity assignment, deprecation notice, data residency, and executive escalation. These moves landed 15 to 25 percent below the opening Microsoft proposal in the Azure OpenAI commitments we benchmarked in 2024 to 2025.

$1/PTU/hr

Global Provisioned hourly list rate, about $730 per PTU per month before any reservation

70%

Discount on the annual PTU reservation versus the hourly rate ($221 versus $730 per month)

85%

Sustained utilization where the annual GPT 5 reservation finally breaks even against Pay As You Go

60 days

Minimum retirement notice Microsoft commits to for generally available Azure OpenAI models

Why the Azure OpenAI Commitment Sits Inside the Microsoft Framework

Azure OpenAI is rarely bought alone. It sits inside the Azure consumption commitment (MACC) and often inside an Enterprise Agreement that also carries Microsoft 365 and Copilot. That placement is deliberate: both Pay As You Go tokens and PTU reservations decrement the MACC, so every Azure OpenAI dollar accelerates your committed cloud burn.

This is why the account team pushes PTU conversion even when the workload math does not support it. A reservation pulls twelve months of AI spend into the current MACC period, helps the commitment retire on schedule, and anchors the next, larger MACC renewal.

The buyer side move is separation. Negotiate the Azure OpenAI commitment as its own conversation with its own utilization evidence, then decide how it nets against the MACC. Customers who let the two blend accept commitment sizes the standalone numbers would have rejected.

The Commercial Model Decoded

Azure OpenAI prices through two mechanics. Pay As You Go bills per million input and output tokens, with the catalog spanning the GPT 5 family, GPT 4.1, the o series reasoning models, and embedding, image, and audio models. A Batch tier prices asynchronous workloads at roughly half the interactive rate, and cached input tokens bill at a 90 percent discount.

Provisioned Throughput Units reserve model processing capacity at a fixed hourly rate per PTU, regardless of usage. PTUs are billed hourly, with one month and one year Azure reservations providing the term discount. Three deployment types carry different rates and data boundaries.

Deployment type	Data boundary	Rate posture	GPT 5 minimum
Global Provisioned	Routed across the worldwide Microsoft fleet	Lowest list, $1 per PTU per hour	15 PTU, increments of 5
Data Zone Provisioned	Confined to the EU or US data zone	Roughly a 10 percent premium over Global	15 PTU, increments of 5
Regional Provisioned	Pinned to one named Azure region	Roughly double the Global rate	50 PTU, increments of 50

Throughput per PTU is set per model. GPT 5 delivers 4,750 input tokens per minute per PTU, and one output token counts as eight input tokens against utilization, matching the 8 to 1 price ratio. Cached tokens are deducted 100 percent from PTU utilization, so aggressive prompt caching effectively expands reserved capacity for free.

Model	Input TPM per PTU	Output token weight	Latency target
GPT 5	4,750	8 input tokens	99% over 50 tokens per second
GPT 5.4	2,400	8 input tokens	99% over 50 tokens per second
GPT 5.5	1,200	8 input tokens	99% over 100 tokens per second
GPT 4.1	3,000	4 input tokens	99% over 80 tokens per second

Read that table twice: PTU billing is model independent, but PTU capacity is not. The same reservation that runs GPT 5 at 4,750 tokens per minute per PTU runs GPT 5.5 at 1,200, so a model upgrade can cut delivered throughput by three quarters at the same price. Section 4 turns that mechanic into contract language.

The PTU Versus Pay As You Go Decision

Start from one anchor: at GPT 5 list of $1.25 per million input tokens and $10 per million output tokens, Pay As You Go costs $1.25 per million utilization weighted tokens regardless of mix. One fully utilized PTU processes about 208 million such tokens per month. So $260 of Pay As You Go buys a full PTU month.

Against that, the three PTU billing modes price as follows.

Chart A. Monthly cost per Global PTU by billing mode. Pay As You Go equivalence sits at $260 per fully utilized PTU month.

The monthly reservation at $260 exactly equals the Pay As You Go value of a fully utilized PTU, so it breaks even only at 100 percent utilization. The annual reservation at $221 breaks even at roughly 85 percent. Below those lines Pay As You Go is cheaper; above them the reservation wins and adds the latency floor.

A worked estate, priced four ways

Take a representative insurance carrier running GPT 5 document workloads: 1,200 million input tokens and 150 million output tokens per month, peaking at twice the average rate. Utilization weighted demand is 2,400 million tokens. Sizing for peak requires 25 Global PTUs, which run at 46 percent average utilization (benchmark scenario, not a quote).

Option	Unit rate	Monthly cost	Versus Pay As You Go
Pay As You Go	$1.25 in / $10.00 out per 1M tokens	$3,000	Baseline
25 PTU, hourly	$730 per PTU per month	$18,250	6.1x the cost
25 PTU, monthly reservation	$260 per PTU per month	$6,500	2.2x the cost
25 PTU, annual reservation	$221 per PTU per month	$5,525	1.8x the cost

Chart B. The worked estate priced four ways. Numbers match the table above. Benchmark ranges: Redress Compliance advisory engagement file, 2024 to 2025.

The right structure for most estates is a commitment ladder: an annual reservation sized to the measured utilization floor (the demand present in 85 percent or more of hours), a monthly reservation tranche for seasonal steps, and Pay As You Go spillover above the reservation, which Azure bills automatically at the hourly rate when deployments exceed reserved PTUs.

Where the common advice is wrong: the standard reseller pitch is to convert to PTU once traffic is steady, citing savings of up to 70 percent. We disagree. That 70 percent compares reservation against hourly PTU, not against Pay As You Go. In the Azure OpenAI estates we benchmarked in 2024 to 2025, most ran below 50 percent sustained utilization, where the annual reservation costs 1.5x to 2x the token bill. Buy PTU for the latency floor on customer facing workloads. Buy it for cost only above roughly 85 percent utilization. Route asynchronous work to Batch at half rate instead.

15 to 25%

Landed below the opening Microsoft proposal

Across the Azure OpenAI commitments we supported in 2024 to 2025, sequencing utilization evidence, deployment type mix, and the clause set landed 15 to 25 percent below Microsoft's opening structure.

30 to 45%

Of reserved PTU capacity sat idle

In the estates we audited, 30 to 45 percent of deployed PTU capacity ran idle outside business hours because reservations were sized to peak instead of floor. Benchmark ranges: Redress Compliance advisory engagement file, 2024 to 2025.

Model Price Evolution Defense

Every Azure OpenAI commitment is a bet against the price curve. The flagship input token list has fallen relentlessly: GPT 4 launched at $30 per million input tokens in March 2023, GPT 4 Turbo cut that to $10, GPT 4o to $5, the August 2024 GPT 4o update to $2.50, and GPT 5 to $1.25 in August 2025.

Chart C. Flagship input token list price at each model launch, March 2023 to August 2025.

Three clauses convert that curve from risk into protection. First, a model price ceiling: negotiated token rates expressed as a discount off the then current public list, so every public price cut flows through automatically. A fixed dollar rate in a falling market is a concession to Microsoft.

Second, model substitution rights with throughput protection. Because tokens per PTU vary by model, a substitution clause must hold the delivered tokens per minute, not just the PTU count. Moving a 25 PTU GPT 5 deployment to GPT 5.5 without that language cuts delivered throughput by roughly three quarters at identical cost.

Third, deprecation and retirement notice. Microsoft's published lifecycle policy guarantees generally available models for at least 12 months and commits to at least 60 days notice before retirement. For a regulated workload, negotiate 90 to 180 days in the agreement, plus funded migration assistance when retirement forces a model move inside the reservation term.

Regional Capacity Strategy

The least understood sentence in the PTU documentation is this one: quota does not guarantee capacity, and neither does a reservation. PTU quota caps what you may deploy in a region; the reservation guarantees a discounted price. Neither guarantees that GPT 5 capacity exists in your region on the day you scale.

Microsoft's own guidance is to deploy first, then buy the reservation, precisely because customers have purchased reservations against capacity that was not available. Cancellation credits on reservations are limited, and reservations for Global, Data Zone, and Regional deployment types are not interchangeable with each other.

The buyer side procedure is a regional capacity audit before any commitment signs:

Audit step	What it establishes	Contract consequence
Map data residency duties	Which workloads genuinely require Data Zone or Regional, versus default Global	Pushes spend to the cheapest compliant deployment type
Test deploy in each target region	Whether capacity actually exists today for your models	Deployment confirmed before any reservation is purchased
Confirm quota headroom	That growth will not stall against the regional quota cap	Quota uplift commitment documented in the agreement
Scope the reservation	Subscription and resource group scope match the deployments	Discount applies; excess PTUs do not silently bill hourly

Then negotiate capacity language: a documented capacity escalation path with named engineering contacts, priority access for committed customers during regional constraint, and the right to rescope the reservation to another region without penalty if Microsoft cannot provision within 30 days.

Microsoft Indemnity for Output

The strongest commercial argument for Azure OpenAI over a direct OpenAI contract is the Customer Copyright Commitment in the Microsoft Product Terms. Microsoft commits to defend customers against third party copyright claims over output content and to pay resulting judgments or settlements.

The protection is conditional, and the conditions are where coverage quietly dies. To be covered for Azure OpenAI output, the customer must have implemented all required mitigations documented by Microsoft, including the content filters and safety system configuration. An engineering team that relaxes content filters for throughput can void the indemnity for the whole workload.

The carve outs matter as much. The commitment does not cover claims arising from your input data, your fine tuning training sets, or material you supply through retrieval augmented generation. Your data pipeline remains your liability.

Three negotiated improvements: a contractual acknowledgment listing your deployed mitigation configuration as compliant at signature, assignment language extending the indemnity benefit to outputs you distribute to your own customers, and audit cooperation duties so a future claim does not stall on evidence Microsoft holds.

Microsoft Copilot and Azure OpenAI Integration

Estates that run both Microsoft 365 Copilot and Azure OpenAI face a deliberate blending play. Copilot lists at $30 per user per month, $360 per user per year. The account team will present Copilot seats and the Azure OpenAI commitment as one AI envelope, sized together, discounted together.

Keep them apart, because the economics are different in kind. Copilot is a per seat product whose value depends on adoption per user. Azure OpenAI is metered infrastructure whose value depends on workload engineering.

The per user math: a heavy internal assistant user consuming 2 million input and 200 thousand output GPT 5 tokens per month costs about $4.50 on Pay As You Go, versus $30 for a Copilot seat. Blended envelope discounts that require both products to grow are how both get oversized.

That is not an argument against Copilot, which bundles the Microsoft 365 graph integration no custom build matches cheaply. It is an argument for routing each use case to its cheapest adequate vehicle: broad knowledge worker assistance to Copilot seats under an adoption gated ramp, custom and high volume workloads to Azure OpenAI under the commitment ladder.

The Eight Contract Levers

These are the clauses we negotiate into Azure OpenAI commitments, in the order they typically trade.

Lever	What it secures	Negotiation anchor
1. PTU grandfather	Existing reservation rates and throughput tables persist through renewal	Renewal repricing only with 90 days notice and a capped uplift
2. Model price ceiling	Token rates as a discount off public list, so cuts flow through	The 96 percent flagship price decline since 2023
3. Substitution rights	Move workloads to successor models at equivalent delivered throughput	Tokens per PTU varies 4x across the GPT 5 family
4. Regional capacity protection	Priority provisioning and penalty free reservation rescope	Quota and reservations carry no capacity guarantee
5. Indemnity assignment	Customer Copyright Commitment extends to your distributed outputs	Mitigation configuration acknowledged at signature
6. Deprecation notice	90 to 180 days notice plus funded migration on forced model moves	Published floor is 60 days for GA models
7. Data residency posture	Data Zone or Regional terms documented per workload class	Pay the premium only where a duty exists
8. Executive escalation	Named escalation path for capacity, pricing, and indemnity events	Quarterly commercial review with utilization data

Sequence matters as much as substance. The levers price differently at different points in Microsoft's quarter, and utilization evidence is the currency that buys them.

Weeks 1 to 2 · Measure

Build the utilization baseline

Pull token telemetry per deployment. Compute sustained utilization against the 85 percent break even line. Classify workloads as latency critical, interactive, or batch.

Weeks 3 to 6 · Structure

Design the commitment ladder

Test deploy in target regions, size the annual reservation to the measured floor, route batch work to the Batch tier, and table the eight clause set with Microsoft.

Weeks 7 to 12 · Close

Trade structure for price

Hold the reservation purchase until deployments are confirmed. Trade commitment term and deployment mix against the price ceiling, substitution, and indemnity clauses. Close at quarter end.

Our recommendation: treat PTU as a performance purchase, the clause set as the savings engine, and the reservation signature as the last step, never the first.

Before any commitment: run the utilization baseline and the regional capacity audit. If sustained utilization is below 85 percent, the annual reservation is costing you money relative to Pay As You Go, however large the quoted discount.
In the negotiation: secure the price ceiling, substitution rights with throughput protection, and the indemnity assignment before discussing reservation size. Structure concedes cheaply early and expensively late.

Redress Compliance is a 100 percent buyer side advisory firm with 500+ enterprise clients and more than $2B under advisory across 11 vendor practices. If an Azure OpenAI commitment is on your desk, contact us or visit our Microsoft practice before you sign. We are glad to tie a meaningful part of the fee to delivered value.

Prepared by Redress Complianceredresscompliance.com

Vendor Advisory

Cloud & Emerging

Programs

Assessments

Research

Knowledge Hubs