Glass office building with cloud reflections at dusk
Microsoft

Azure OpenAI SLA and support. Uptime is not the whole story.

The 99.9 covers availability only. Latency, retirement risk, and support severity are bought separately, and they negotiate.

Contact Us Microsoft Advisory
500+Enterprise clients
$2B+Under advisory
Industry Recognized
500+ Enterprise Clients
$2B+ Under Advisory
11 Vendor Practices
100% Buyer Side Independent

Azure OpenAI ships a 99.9 percent availability SLA that says nothing about latency, throughput, or model quality. The terms that protect a production AI workload live in provisioned throughput, support tiers, and the contract you negotiate around the SLA.

Key takeaways

  • The SLA covers availability only: 99.9 percent uptime, with credits as the sole remedy. Latency and quality are out of scope.
  • Credits are claimed, not paid: you must file within the claim window with your own evidence; nobody pays automatically.
  • Latency protection is bought: provisioned throughput units, not the SLA, are how you buy predictable performance.
  • Support is a separate purchase: the SLA is not a support plan; severity response times come from your Azure support tier.
  • Model retirements are your risk: deployment lifecycles force migrations the SLA never compensates.
  • PTU commitments negotiate: reservation pricing and term flexibility moved 15 to 25 percent in deals we advised.

What does the Azure OpenAI SLA actually cover?

The Azure OpenAI service carries a 99.9 percent availability SLA, and availability is the entire scope. The online services SLA terms define the metric, the credit ladder, and the claim mechanics.

Latency, throughput, token generation speed, and model output quality sit outside the SLA. For a production AI feature, those are usually the failure modes that matter.

What the SLA covers versus what production needs

RiskIn the SLA?Where protection actually comes from
Service unavailableYes, 99.9 percentCredit claim after breach
Slow responses, latency spikesNoProvisioned throughput units
Throttling at high loadNoPTU reservation sizing
Model quality regressionNoDeployment pinning and eval gates
Model retirementNoMigration planning, contract notice terms

How do SLA credits actually pay out?

They pay as service credits against future spend, only after you file a claim with evidence inside the claim window. Build the monitoring evidence trail before the incident, not after.

How do you buy predictable performance with PTUs?

Provisioned throughput is the latency instrument. Provisioned throughput units reserve model processing capacity with consistent latency, where standard pay as you go shares capacity and absorbs the noisy neighbor problem.

  • Size on measured tokens: base PTU counts on observed tokens per minute from a pilot, not on launch forecasts.
  • Blend the modes: PTU for the latency sensitive core, pay as you go for batch and overflow.
  • Reservations reprice it: monthly and yearly PTU reservations discount heavily against hourly PTU rates.

What PTU utilization should you target?

Above 60 percent sustained utilization before adding capacity. The 35 to 55 percent utilization we found in our reviews means a third to half of reserved AI capacity was paid headroom.

Which Azure support plan does an AI workload need?

The SLA is not support. Severity response times, escalation paths, and engineering access come from your Azure support plan, purchased separately or wrapped into a Unified agreement.

  • Developer tier: business hours only; wrong for anything in production.
  • Standard and Professional Direct: 1 hour critical case response; the floor for production AI features.
  • Unified: negotiated enterprise wide; fold the AI workload into the Unified scope discussion explicitly.

Does Copilot or M365 support cover Azure OpenAI?

No. Azure OpenAI is an Azure service under Azure support terms. A Microsoft 365 support relationship does not carry severity commitments for your Azure AI deployment.

What should you negotiate around the SLA?

The SLA itself rarely moves, but the commercial frame around it does. In our 2024 to 2025 reviews, PTU reservation pricing, term flexibility, and migration support moved 15 to 25 percent of committed AI spend.

  1. Negotiate PTU reservation rates against your committed Azure growth, not as a standalone line.
  2. Cap the commitment term while model economics are falling; shorter reservations beat long locks.
  3. Put model retirement notice and migration assistance into the agreement text.
  4. Tie AI spend into MACC drawdown so it earns your existing discount structure.
  5. Rehearse the SLA credit claim path and assign an owner before go live.

Should AI spend sit inside the MACC?

Yes. Azure OpenAI consumption draws down a Microsoft Azure Consumption Commitment like any Azure service, which makes the AI line part of your negotiated discount fabric instead of a side purchase.

Where the common advice on the Azure OpenAI SLA is wrong

The standard advice is to scrutinize the SLA percentage and negotiate it upward. We disagree. In roughly 12 to 18 Azure OpenAI commitment reviews Fredrik Filipsson ran in 2024 to 2025, not one production incident that caused business damage was an availability breach; they were latency and throughput degradations the SLA does not cover at any percentage. The buyer side move is to spend the negotiation capital on PTU reservation pricing, term flexibility, and retirement notice terms, and to treat the 99.9 number as marketing furniture. A nine never paid for a slow checkout.

Operations dashboard showing API latency percentile charts
P95 latency is the number a customer feels, and it appears nowhere in the availability SLA that dominates most contract reviews.

What the engagement data shows

Three cuts of our advisory engagement file frame the size of the opportunity.

12 to 18
Azure OpenAI commitment reviews 2024 to 2025
35 to 55%
Observed PTU utilization vs capacity paid
15 to 25%
Spend moved by negotiating around the SLA

Source: Redress Compliance advisory engagement file, 2024 to 2025.

What to do next

Five moves turn this analysis into a lower invoice on the next renewal.

A sequence you can run this quarter

  1. Inventory which AI workloads are latency sensitive and which tolerate batch behavior.
  2. Measure real tokens per minute in a pilot before sizing any PTU reservation.
  3. Upgrade Azure support to at least a 1 hour critical response tier for production AI.
  4. Route Azure OpenAI consumption through the MACC and your discount structure.
  5. Add model retirement notice and migration terms to the agreement.
  6. Assign an owner for SLA evidence capture and the credit claim path.
Cover of the Azure OpenAI Service Commitment Playbook white paper from Redress Compliance

White Paper · Microsoft

Azure OpenAI Service Commitment Playbook

When Azure OpenAI PTUs beat Pay As You Go and when they do not, plus the model price drops and regional capacity traps that change the commit math. Read it free.

Read the white paper

Frequently asked questions

What SLA does Azure OpenAI offer?

A 99.9 percent availability SLA with service credits as the remedy. It covers whether the service responds, not how fast it responds or how well the model performs, so latency and quality risk need separate instruments.

Does the Azure OpenAI SLA cover latency or throughput?

No. Latency, throttling, and token generation speed are out of SLA scope at any tier. Provisioned throughput units are the mechanism for predictable performance, and they are bought, not promised.

How do I claim Azure OpenAI SLA credits?

File a claim through Azure support within the claim window defined in the online services SLA terms, with your own monitoring evidence of the breach. Credits offset future spend; they are never paid proactively.

Are PTU reservations negotiable?

Yes. Monthly and yearly PTU reservations discount steeply against hourly rates, and reservation pricing moved 15 to 25 percent in commitment reviews we advised in 2024 to 2025 when tied to broader Azure growth.

Does Azure OpenAI spend count toward a MACC?

Yes. Azure OpenAI consumption draws down a Microsoft Azure Consumption Commitment like any Azure service, so route the AI line through the MACC to earn your negotiated discount structure.

What happens when Microsoft retires a model version?

Deployments on retired versions are forced to migrate on Microsoft's lifecycle schedule, and the SLA pays nothing for the migration work. Negotiate notice periods and migration assistance into the agreement before committing.

Free Download

The full Azure OpenAI Commitment Paper framework from the Microsoft Advisory.

PTU sizing math, reservation benchmarks, and the contract terms that matter more than the SLA.

Used across more than five hundred enterprise engagements. Independent. Buyer side. Built for procurement leaders running the next renewal cycle.

No spam. We will only email you about this download. Privacy.
Run a software spend health check against your Microsoft estate in under five minutes.
Open the Tool →
12 to 18
Azure OpenAI commitment reviews 2024 to 2025
35 to 55%
Observed PTU utilization vs capacity paid
15 to 25%
Spend moved by negotiating around the SLA

The SLA tells you when Microsoft owes you an apology. The PTU reservation tells you when your customers get an answer. Fund the second.

Fredrik Filipsson
Co Founder and Group CEO. Ex Oracle, IBM, SAP.
Deep Library

More on this topic.

Microsoft Advisory →
Pricing spreadsheet open on a monitor
Microsoft
Azure OpenAI Enterprise Pricing
PTU versus pay as you go math for enterprise deployments.
8 min read
Negotiation meeting in a glass boardroom
Microsoft
Azure OpenAI Negotiation Guide
The commercial levers on committed AI spend.
7 min read
Two colleagues comparing options at a desk
Microsoft
Azure OpenAI vs Direct OpenAI
Where the enterprise terms genuinely differ.
8 min read
Editorial boardroom interior

The advisor your vendors do not want.

500+ enterprise clients. 11 vendor practices. Industry recognized. One conversation can change what you pay for the next three years.

Stay ahead of Microsoft licensing changes.

One buyer side briefing a week. Pricing moves, audit signals, and the levers that work. No vendor spin.