A buyer side walkthrough of the 2026 Azure OpenAI Service SLA. What the credit covers, what it excludes, and the five negotiated levers that strengthen service levels at renewal.
The Azure OpenAI Service SLA reads as ninety nine point nine percent uptime, but coverage gaps in content filtering, model latency, and provisioned throughput leave most enterprise buyers exposed in 2026.
This article is for procurement leaders, FinOps owners, and platform architects renewing a Microsoft Enterprise Agreement that includes Azure OpenAI Service in 2026. Read it alongside the Microsoft Copilot enterprise licensing pillar, the Azure OpenAI enterprise pricing guide, and the Microsoft Practice.
The published Microsoft Online Services SLA for Azure OpenAI Service is a 99.9 percent monthly uptime promise. The credit you receive when Microsoft misses the number is small. The exclusions matter more than the headline.
Microsoft commits to availability of the inference endpoint. The SLA does not commit to model output quality, latency, or content filter responsiveness. A model that returns a 429 Too Many Requests on most calls still counts as available in Microsoft's metric.
Latency degradation is the most common production failure mode we see in 2024 and 2025. None of those incidents triggered a credit under the public SLA.
Credits are tiered. Under 99.9 percent uptime returns ten percent of the monthly fee for the affected resource. Under 99 percent returns twenty five percent. Under 95 percent returns one hundred percent.
For a customer spending one hundred thousand dollars a month on Azure OpenAI Service, a tier one outage returns ten thousand dollars. The downstream revenue impact of a failed support chatbot or claims triage workflow is often much larger.
The exclusion list is long. Content filter unavailability, model version deprecation, tokenizer changes, preview features, and any incident caused by your own code are all excluded. The customer has thirty days from the incident to file a claim, and the burden of proof sits on the customer.
Azure OpenAI Service in 2026 has three commercial models. Pay as you go on shared capacity, Provisioned Throughput Units (PTU) on dedicated capacity, and PTU Managed monthly. Each has a different SLA posture.
PTU customers buy dedicated throughput for a flat hourly or monthly rate. The 99.9 percent uptime metric still applies, but the credit base is the PTU fee, not the per token cost. PTU outages are rarer because the capacity is reserved, but when they happen the credit value is larger.
Pay as you go workloads compete for shared capacity. During peak demand we have seen consistent 429 throttling that did not breach the 99.9 percent uptime metric because the endpoint was technically available. The customer paid full price for a degraded service with no credit recourse.
PTU Managed bundles capacity, support, and a higher tier of monitoring under a single monthly commitment. It does not change the public SLA number, but customers on PTU Managed report faster incident response. Negotiate the response time into the order form, not the public SLA.
Azure OpenAI SLA: posture by deployment model, 2026
| Deployment model | Uptime promise | Credit base | Hidden risk |
|---|---|---|---|
| Pay as you go | 99.9 percent regional | Consumed tokens | Throttling without breach |
| PTU hourly | 99.9 percent per deployment | PTU hourly fee | Capacity reassignment |
| PTU monthly | 99.9 percent regional | Monthly commitment | Single region exposure |
| PTU Managed | 99.9 percent regional | Monthly commitment | Roadmap deprecation |
| Negotiated MCA term | Up to 99.95 percent | Total Azure OpenAI spend | Renewal cycle timing |
Where the common advice on Azure OpenAI SLAs is wrong is the assumption that ninety nine point nine percent uptime is the whole story. Latency, capacity, and content filter availability are the failures that hurt production workloads, and none of them are in the public SLA.
The 99.9 percent SLA is regional. If East US 2 goes down, customers in that region get a credit. Customers running active passive failover to Sweden Central or West Europe absorb the failover cost themselves.
The Azure status history shows multiple multi hour Azure OpenAI incidents in East US, East US 2, and Sweden Central across 2024 and 2025. Sweden Central was capacity constrained for most of Q3 2025. Customers who built their deployments only in Sweden Central paid for capacity they could not always access.
Active active across two regions roughly doubles your PTU spend. Active passive with warm capacity in a second region adds 30 to 50 percent. Few customers budget for this in the original deal. Microsoft does not co fund it.
EU customers running in West Europe or Sweden Central often cannot fail over to US regions for data residency reasons. The SLA treats the EU region as a single point of failure, and the credit is the customer's only recourse. Build for this constraint at design time.
The public SLA is a floor. Microsoft will negotiate stronger commitments inside the order form or a Microsoft Customer Agreement amendment for customers with enough leverage. The leverage threshold is roughly two hundred fifty thousand dollars per year in committed Azure OpenAI spend.
Five terms move with leverage. Stronger uptime tiers, latency floors, capacity guarantees, named technical account managers, and faster credit issuance. Each one matters more than the headline percentage.
Most credit claims fail because the customer does not collect the right telemetry. You need synthetic probes from outside the Azure region, request and response logging with timestamps, and 429 and 5xx counts segmented by deployment. Without this you cannot prove the breach.
Raise it twelve months before renewal, not in the last quarter. Microsoft will not amend an active EA for a single SLA term, but they will package it into the renewal if the request lands early enough to reach the regional CFO sign off.
No. The Azure OpenAI SLA covers endpoint availability only. Latency, throughput degradation, and content filter delays are excluded. Negotiate a p95 token latency floor into the order form if your workload depends on it.
Credits are issued in the billing cycle after Microsoft validates the claim. Most credit claims take 30 to 60 days, and the customer must file within 30 days of the incident.
No. Content filter unavailability is explicitly excluded. If your workload cannot run without the filter, your effective uptime is the lower of the model uptime and the filter uptime.
Yes, but only for customers with material committed Azure OpenAI spend. The leverage threshold is roughly two hundred fifty thousand dollars per year. Stronger tiers go up to 99.95 percent in our experience.
PTU gives reserved capacity, which reduces throttling. The published uptime number is the same. Negotiate latency and capacity commitments separately for PTU workloads.
Synthetic probes from outside the Azure region, full request and response logs with timestamps, and 429 and 5xx counts segmented by deployment. Without this telemetry, credit claims usually fail.
No. Preview model deployments are excluded from the SLA. Production workloads should never run on preview model versions unless you accept zero credit recourse.
The SLA is separate from price. A weaker negotiated price often comes with the standard SLA. A higher price can be paired with a stronger SLA if you bring the lever into the renewal cycle early.
Microsoft renewal moves, the EA framework, the M365 SKU framework, the Copilot framework, and the buyer side moves across the full Microsoft estate.
Used across more than five hundred enterprise engagements. Independent. Buyer side. Built for procurement leaders running the next renewal cycle.
The Azure OpenAI public SLA is a floor, not a ceiling. Customers with material PTU spend can negotiate latency floors and capacity guarantees that the headline ninety nine point nine percent does not deliver.
500+ enterprise clients. 11 vendor practices. Industry recognized. One conversation can change what you pay for the next three years.
One short note on Microsoft licensing moves, EA renewal posture, and Azure OpenAI service levels. No noise.