What is Microsoft Azure OpenAI Service?

Azure OpenAI is the Microsoft commercial wrapper around the OpenAI model family (GPT 4o, GPT 4o mini, GPT 4 Turbo, GPT 3.5 Turbo, o1, o1 mini, o3, embeddings, DALL E 3, Whisper, TTS) consumed under Microsoft Azure commercial terms inside the customer Azure tenant. Azure OpenAI typically anchors against the broader Microsoft Azure consumption framework, the broader Microsoft Enterprise Agreement framework, and the broader MACC framework.

How does Azure OpenAI token pricing work?

Token pricing anchors the broader Azure OpenAI consumption framework. Different models face different per million token rates across input, output, and cached tiers. GPT 4o typically anchors at roughly $2.50 per million input tokens and $10.00 per million output tokens. GPT 4o mini typically anchors at roughly $0.15 per million input tokens and $0.60 per million output tokens. The o1 family anchors at roughly $15 input and $60 output. Cached input typically delivers fifty percent discount.

What is a Provisioned Throughput Unit (PTU)?

A PTU is a fixed dedicated throughput unit anchored at a predictable monthly cost. The framework segments across Provisioned Managed (single region), Provisioned Global (multi region failover), and Data Zone Provisioned (regional data residency). PTU typically anchors against minimum commitment thresholds, annual commitment discounts of roughly thirty percent, and reservation lock in.

Should I run Azure OpenAI Standard or Provisioned?

Standard typically suits low volume workloads, prototyping, and burst patterns. Provisioned typically suits production, predictable throughput, and latency sensitive workloads. The breakeven point typically lands at roughly fifty to seventy percent Standard utilisation against the broader PTU monthly cost framework. Below breakeven, Standard typically wins. Above breakeven, Provisioned typically wins.

How does Azure OpenAI roll up into the Microsoft EA?

Azure OpenAI rolls up into the broader Microsoft Azure framework, which rolls up into the broader Microsoft Azure Consumption Commitment (MACC) framework, which rolls up into the broader Microsoft Enterprise Agreement framework. PTU annual reservations typically convert into MACC drawdown. The broader buyer side framework typically anchors against the broader EA renewal cycle.

Microsoft Azure OpenAI Service

Guide · Microsoft · Azure OpenAI

Azure OpenAI is a token, throughput, and commitment framework. Read the guide.

Azure OpenAI Service is the Microsoft commercial wrapper around the OpenAI model family. Same GPT-4o, o1, o3, embeddings, DALL-E 3, Whisper, and TTS models that OpenAI sells direct, but consumed under Azure commercial terms inside the customer's Azure tenant. The pricing model has three axes: token economics by model, throughput strategy (Standard pay as you go versus Provisioned Throughput Units), and commitment structure (PAYG, PTU monthly, PTU annual rolled into MACC drawdown). This guide covers the published token rates, the PTU breakeven math, the Standard versus Provisioned decision framework, the Microsoft EA and MACC roll up mechanics, and the 11 move buyer side playbook that delivers 25 to 40 percent against the unoptimized Azure OpenAI baseline. Read the related Microsoft services practice, the GenAI vendors practice, and the Microsoft EA renewal playbook.

What Azure OpenAI Service actually is

Azure OpenAI runs the same OpenAI model family as ChatGPT Enterprise and the OpenAI API, with three operational differences that matter to enterprise customers.

Tenant boundary. The compute and data planes sit in the customer's Azure tenant, which means Azure regional data residency, Azure Active Directory integration, and the broader Azure security envelope apply.
Commercial wrapper. Billing flows through the Microsoft EA or Microsoft Customer Agreement, which means Azure OpenAI consumption counts toward MACC commitments, can be optimized against MACC, and rolls up to the EA renewal cycle.
Governance stack. Azure governance tools including Azure Monitor, Microsoft Purview, Azure Policy, and Azure Cost Management apply, which materially simplifies enterprise governance compared to OpenAI direct.

The buyer side question on every Azure OpenAI deployment is whether to run Azure OpenAI or OpenAI direct. The answer depends on data residency requirements, existing Microsoft commercial position, and governance posture. Most regulated enterprises default to Azure OpenAI for data residency and governance reasons. Most technology companies without regulated workloads default to OpenAI direct for lower friction model access. Read the related CIO playbook for negotiating OpenAI contracts.

Token economics: published rates by model

Azure OpenAI token pricing, January 2026

Model	Input per 1M tokens	Output per 1M tokens	Cached input
GPT-4o	$2.50	$10.00	$1.25
GPT-4o mini	$0.15	$0.60	$0.075
o1	$15.00	$60.00	$7.50
o1 mini	$3.00	$12.00	$1.50
o3	$20.00	$80.00	$10.00
GPT-3.5 Turbo	$0.50	$1.50	N/A

Source: Microsoft Azure OpenAI Service pricing page, January 2026. Batch API delivers 50 percent discount against Standard rates with 24 hour completion window. Cached input pricing applies to repeated prefix tokens within 5 minute windows.

4 token control levers compound:

Prompt caching. Delivers 50 percent discount on repeated prefix tokens, which is significant for RAG workloads with stable system prompts.
Batch API. Delivers 50 percent discount on async workloads but adds 24 hour latency, making it suitable for overnight processing not interactive use cases.
Model selection routing. GPT-4o mini for classification and routing tasks, GPT-4o for content generation, o1 family only for deep reasoning workflows. Mixed deployment typically delivers 60 to 75 percent reduction against all GPT-4o or all o1 deployment.
Context window engineering. Shorter prompts and tighter context retrieval reduce input token consumption proportionally.

Provisioned Throughput Units (PTU)

PTU reframes Azure OpenAI from consumption based pricing to dedicated capacity. A PTU represents a defined throughput floor (tokens per minute, varying by model). Customer commits to a PTU count, pays a fixed monthly fee per PTU, and consumes within the provisioned capacity without per token billing. PTU is sold across three deployment types: Provisioned Managed (single region, no failover), Provisioned Global (multi region failover for resilience), and Data Zone Provisioned (regional data residency commitments for regulated workloads).

Minimum commitments vary by model: typically 50 PTU floor for GPT-4o, 25 PTU floor for GPT-4o mini, with the floor adjusted by Azure region availability. Annual reservations deliver approximately 30 percent discount versus monthly PTU. PTU annual reservations roll into Microsoft Azure Consumption Commitment (MACC) drawdown, which makes them count toward the broader EA commercial position. The PTU governance discipline is to size PTU at 10 to 20 percent above measured peak throughput, with Provisioned Global handling spillover at PAYG rates.

Standard versus Provisioned breakeven

The Standard versus Provisioned decision turns on utilization economics. Standard wins below the breakeven point. Provisioned wins above. The breakeven typically sits at 50 to 70 percent Standard utilization measured against the PTU monthly cost. Below 50 percent utilization, the Provisioned commitment is wasted capacity. Above 70 percent utilization, the Provisioned commitment delivers material savings against equivalent Standard consumption plus eliminates token rate exposure to model price changes during the term.

The practical implementation is hybrid: Provisioned for the production workload baseline, Standard for development, testing, and burst capacity beyond Provisioned headroom. Provisioned Global resilience workloads add a separate Provisioned capacity in a second region for failover. The Standard versus Provisioned framework refreshes quarterly against actual measured throughput.

Azure OpenAI inside the Microsoft EA

Azure OpenAI consumption rolls up into Azure spend, which rolls into MACC drawdown, which rolls into the Microsoft EA. Four control points matter.

MACC absorption. Azure OpenAI counts toward MACC commitment, which means a customer with an existing $5M annual MACC commitment can absorb Azure OpenAI consumption without incremental commitment.
Renewal window. The EA renewal is the right negotiation window for Azure OpenAI pricing concessions; mid term Azure OpenAI commercial discussions deliver materially less leverage.
PTU into MACC drawdown. PTU annual reservations convert to MACC drawdown, which makes Provisioned commitment more attractive at the EA level than at the standalone Azure OpenAI level.
Copilot integration. Microsoft 365 Copilot consumes Azure OpenAI under the hood; a customer running both Copilot and direct Azure OpenAI should manage them as a single commercial position.

Read the related Microsoft EA renewal playbook and the Microsoft Azure MACC negotiation.

Azure OpenAI governance

5 governance control points apply at scale:

Model approval. A defined internal committee approves which models are available to which workloads. Not every workload needs o1 or o3 access.
Data residency. Enforced through Azure region selection and Data Zone Provisioned deployments where regulatory requirements demand it.
Content filtering. The Azure OpenAI content filter framework covering hate, violence, self harm, and sexual content with configurable severity thresholds.
Audit logging. Integrated with Azure Monitor for technical logs and Microsoft Purview for compliance audit trails.
Cost allocation. Applied through the 6 tag policy described in the Azure FinOps cost governance framework.

11 move buyer side playbook

Measure actual peak throughput from production traffic, not forecast. Forecasts inflate by 40 to 80 percent on AI workloads.
Model the Standard vs Provisioned breakeven against measured utilization. Re run quarterly.
Size PTU at 10 to 20 percent above peak, not at peak. Provisioned Global absorbs spillover at PAYG rates.
Enable prompt caching on RAG workloads. 50 percent discount on repeated prefix tokens.
Use Batch API for async workloads. 50 percent discount with 24 hour latency.
Build the model selection routing layer. GPT-4o mini for routing, GPT-4o for content, o1 family for reasoning only.
Convert PTU annual reservations into MACC drawdown. Roll Azure OpenAI commitment into the broader EA position.
Negotiate Azure OpenAI pricing inside the EA renewal cycle. Mid term discussions deliver materially less leverage.
Manage Microsoft 365 Copilot and Azure OpenAI as a single commercial position. Both consume the same underlying OpenAI models.
Apply Data Zone Provisioned for regulated workloads. Regional data residency commitment from Microsoft.
Tag and allocate Azure OpenAI consumption to business units. Apply chargeback to drive behavioral change on prompt engineering and model selection.

How we engage

Azure OpenAI assessment. 6 week scoping covering actual throughput measurement, Standard versus Provisioned breakeven, PTU sizing model, prompt caching opportunity, and Batch API workload identification. GenAI Vendors Practice.
Microsoft EA renewal with Azure OpenAI integration. 9 month managed renewal sequence with Azure OpenAI negotiated as part of the broader EA position. Renewal Program.
Vendor Shield for Microsoft and GenAI. Continuous advisory across the Microsoft estate and the GenAI vendor mix including OpenAI, Anthropic, Google. Vendor Shield.
GenAI commercial benchmarking. Azure OpenAI pricing benchmarked against OpenAI direct, Anthropic Claude Enterprise, Google Gemini Enterprise. Benchmarking Practice.

Vendor Advisory

Cloud & Emerging

Programs

Assessments

Research

Knowledge Hubs

Assessment Tools

Microsoft Azure OpenAI Service. Enterprise pricing guide.

What Azure OpenAI Service actually is

Token economics: published rates by model

Azure OpenAI token pricing, January 2026

Provisioned Throughput Units (PTU)

Standard versus Provisioned breakeven

Azure OpenAI inside the Microsoft EA

Azure OpenAI governance

11 move buyer side playbook

How we engage

Download the Microsoft EA Renewal Playbook.

The Microsoft Ea Renewal Playbook

More on this topic.

Your next renewal is an opportunity.

Azure OpenAI intelligence, monthly.