Editorial photograph
Guide · Microsoft · Azure OpenAI

Microsoft Azure OpenAI Service. Enterprise pricing guide.

Same OpenAI models as ChatGPT and the OpenAI API, but inside the customer's Azure tenant with EA commercial terms. GPT-4o at $2.50 input and $10 output per million tokens. PTU breakeven at 50 to 70 percent utilization. MACC drawdown converts PTU annual to EA commitment. 11 buyer side moves.

Read the Framework Microsoft Practice
25 to 40%Negotiation saving
500+Vendor engagements
Industry Recognized
500+ Enterprise Clients
$2B+ Under Advisory
11 Vendor Practices
100% Buyer Side Independent

Azure OpenAI Service is the Microsoft commercial wrapper around the OpenAI model family. Same GPT-4o, o1, o3, embeddings, DALL-E 3, Whisper, and TTS models that OpenAI sells direct, but consumed under Azure commercial terms inside the customer's Azure tenant. The pricing model has three axes: token economics by model, throughput strategy (Standard pay as you go versus Provisioned Throughput Units), and commitment structure (PAYG, PTU monthly, PTU annual rolled into MACC drawdown). This guide covers the published token rates, the PTU breakeven math, the Standard versus Provisioned decision framework, the Microsoft EA and MACC roll up mechanics, and the 11 move buyer side playbook that delivers 25 to 40 percent against the unoptimized Azure OpenAI baseline. Read the related Microsoft services practice, the GenAI vendors practice, and the Microsoft EA renewal playbook.

What Azure OpenAI Service actually is

Azure OpenAI runs the same OpenAI model family as ChatGPT Enterprise and the OpenAI API, with three operational differences that matter to enterprise customers.

  1. Tenant boundary. The compute and data planes sit in the customer's Azure tenant, which means Azure regional data residency, Azure Active Directory integration, and the broader Azure security envelope apply.
  2. Commercial wrapper. Billing flows through the Microsoft EA or Microsoft Customer Agreement, which means Azure OpenAI consumption counts toward MACC commitments, can be optimized against MACC, and rolls up to the EA renewal cycle.
  3. Governance stack. Azure governance tools including Azure Monitor, Microsoft Purview, Azure Policy, and Azure Cost Management apply, which materially simplifies enterprise governance compared to OpenAI direct.

The buyer side question on every Azure OpenAI deployment is whether to run Azure OpenAI or OpenAI direct. The answer depends on data residency requirements, existing Microsoft commercial position, and governance posture. Most regulated enterprises default to Azure OpenAI for data residency and governance reasons. Most technology companies without regulated workloads default to OpenAI direct for lower friction model access. Read the related CIO playbook for negotiating OpenAI contracts.

Token economics: published rates by model

Azure OpenAI token pricing, January 2026

ModelInput per 1M tokensOutput per 1M tokensCached input
GPT-4o$2.50$10.00$1.25
GPT-4o mini$0.15$0.60$0.075
o1$15.00$60.00$7.50
o1 mini$3.00$12.00$1.50
o3$20.00$80.00$10.00
GPT-3.5 Turbo$0.50$1.50N/A

Source: Microsoft Azure OpenAI Service pricing page, January 2026. Batch API delivers 50 percent discount against Standard rates with 24 hour completion window. Cached input pricing applies to repeated prefix tokens within 5 minute windows.

4 token control levers compound:

  1. Prompt caching. Delivers 50 percent discount on repeated prefix tokens, which is significant for RAG workloads with stable system prompts.
  2. Batch API. Delivers 50 percent discount on async workloads but adds 24 hour latency, making it suitable for overnight processing not interactive use cases.
  3. Model selection routing. GPT-4o mini for classification and routing tasks, GPT-4o for content generation, o1 family only for deep reasoning workflows. Mixed deployment typically delivers 60 to 75 percent reduction against all GPT-4o or all o1 deployment.
  4. Context window engineering. Shorter prompts and tighter context retrieval reduce input token consumption proportionally.

Provisioned Throughput Units (PTU)

PTU reframes Azure OpenAI from consumption based pricing to dedicated capacity. A PTU represents a defined throughput floor (tokens per minute, varying by model). Customer commits to a PTU count, pays a fixed monthly fee per PTU, and consumes within the provisioned capacity without per token billing. PTU is sold across three deployment types: Provisioned Managed (single region, no failover), Provisioned Global (multi region failover for resilience), and Data Zone Provisioned (regional data residency commitments for regulated workloads).

Minimum commitments vary by model: typically 50 PTU floor for GPT-4o, 25 PTU floor for GPT-4o mini, with the floor adjusted by Azure region availability. Annual reservations deliver approximately 30 percent discount versus monthly PTU. PTU annual reservations roll into Microsoft Azure Consumption Commitment (MACC) drawdown, which makes them count toward the broader EA commercial position. The PTU governance discipline is to size PTU at 10 to 20 percent above measured peak throughput, with Provisioned Global handling spillover at PAYG rates.

Standard versus Provisioned breakeven

The Standard versus Provisioned decision turns on utilization economics. Standard wins below the breakeven point. Provisioned wins above. The breakeven typically sits at 50 to 70 percent Standard utilization measured against the PTU monthly cost. Below 50 percent utilization, the Provisioned commitment is wasted capacity. Above 70 percent utilization, the Provisioned commitment delivers material savings against equivalent Standard consumption plus eliminates token rate exposure to model price changes during the term.

The practical implementation is hybrid: Provisioned for the production workload baseline, Standard for development, testing, and burst capacity beyond Provisioned headroom. Provisioned Global resilience workloads add a separate Provisioned capacity in a second region for failover. The Standard versus Provisioned framework refreshes quarterly against actual measured throughput.

Azure OpenAI inside the Microsoft EA

Azure OpenAI consumption rolls up into Azure spend, which rolls into MACC drawdown, which rolls into the Microsoft EA. Four control points matter.

  1. MACC absorption. Azure OpenAI counts toward MACC commitment, which means a customer with an existing $5M annual MACC commitment can absorb Azure OpenAI consumption without incremental commitment.
  2. Renewal window. The EA renewal is the right negotiation window for Azure OpenAI pricing concessions; mid term Azure OpenAI commercial discussions deliver materially less leverage.
  3. PTU into MACC drawdown. PTU annual reservations convert to MACC drawdown, which makes Provisioned commitment more attractive at the EA level than at the standalone Azure OpenAI level.
  4. Copilot integration. Microsoft 365 Copilot consumes Azure OpenAI under the hood; a customer running both Copilot and direct Azure OpenAI should manage them as a single commercial position.

Read the related Microsoft EA renewal playbook and the Microsoft Azure MACC negotiation.

Azure OpenAI governance

5 governance control points apply at scale:

  1. Model approval. A defined internal committee approves which models are available to which workloads. Not every workload needs o1 or o3 access.
  2. Data residency. Enforced through Azure region selection and Data Zone Provisioned deployments where regulatory requirements demand it.
  3. Content filtering. The Azure OpenAI content filter framework covering hate, violence, self harm, and sexual content with configurable severity thresholds.
  4. Audit logging. Integrated with Azure Monitor for technical logs and Microsoft Purview for compliance audit trails.
  5. Cost allocation. Applied through the 6 tag policy described in the Azure FinOps cost governance framework.

11 move buyer side playbook

  1. Measure actual peak throughput from production traffic, not forecast. Forecasts inflate by 40 to 80 percent on AI workloads.
  2. Model the Standard vs Provisioned breakeven against measured utilization. Re run quarterly.
  3. Size PTU at 10 to 20 percent above peak, not at peak. Provisioned Global absorbs spillover at PAYG rates.
  4. Enable prompt caching on RAG workloads. 50 percent discount on repeated prefix tokens.
  5. Use Batch API for async workloads. 50 percent discount with 24 hour latency.
  6. Build the model selection routing layer. GPT-4o mini for routing, GPT-4o for content, o1 family for reasoning only.
  7. Convert PTU annual reservations into MACC drawdown. Roll Azure OpenAI commitment into the broader EA position.
  8. Negotiate Azure OpenAI pricing inside the EA renewal cycle. Mid term discussions deliver materially less leverage.
  9. Manage Microsoft 365 Copilot and Azure OpenAI as a single commercial position. Both consume the same underlying OpenAI models.
  10. Apply Data Zone Provisioned for regulated workloads. Regional data residency commitment from Microsoft.
  11. Tag and allocate Azure OpenAI consumption to business units. Apply chargeback to drive behavioral change on prompt engineering and model selection.

How we engage

  • Azure OpenAI assessment. 6 week scoping covering actual throughput measurement, Standard versus Provisioned breakeven, PTU sizing model, prompt caching opportunity, and Batch API workload identification. GenAI Vendors Practice.
  • Microsoft EA renewal with Azure OpenAI integration. 9 month managed renewal sequence with Azure OpenAI negotiated as part of the broader EA position. Renewal Program.
  • Vendor Shield for Microsoft and GenAI. Continuous advisory across the Microsoft estate and the GenAI vendor mix including OpenAI, Anthropic, Google. Vendor Shield.
  • GenAI commercial benchmarking. Azure OpenAI pricing benchmarked against OpenAI direct, Anthropic Claude Enterprise, Google Gemini Enterprise. Benchmarking Practice.
Run the Microsoft 365 license optimizer against your actual Azure OpenAI consumption framework in under five minutes.
Open the Microsoft 365 License Optimizer →
White Paper · Microsoft

Download the Microsoft EA Renewal Playbook.

A buyer side framework for the broader Microsoft EA renewal cycle. The Microsoft EA framework, the Microsoft Azure consumption framework, the MACC framework, the Microsoft 365 framework, and the buyer side moves at the broader Microsoft EA renewal cycle.

Used across more than five hundred enterprise software engagements. Independent. Buyer side. Built for Microsoft customers running the next renewal cycle.

The Microsoft Ea Renewal Playbook

Open the white paper in your browser. Corporate email only.

Open the Paper →
25 to 40%
Negotiation saving
11 moves
Buyer side framework
Industry
Recognized
500+
Enterprise clients
100%
Buyer side

We were running GPT-4o across all use cases at $80K monthly Standard burn. Redress measured throughput, built the model routing layer (4o mini for classification, 4o for content, o1 for the actuarial reasoning workflow), enabled prompt caching on the RAG layer, and committed 80 PTU annual at the 65 percent breakeven point. 31 percent reduction, $25K monthly out of the bill.

Chief Information Officer
Global financial services group
Deep Library

More on this topic.

Microsoft Practice →
Microsoft Services Practice
Microsoft · Practice
Microsoft Services Practice
The Microsoft services practice.
18 min read
Microsoft Knowledge Hub
Microsoft · Hub
Microsoft Knowledge Hub
The Microsoft knowledge hub.
16 min read
Microsoft EA Renewal Playbook
Microsoft · Paper
Microsoft EA Renewal Playbook
The Microsoft EA renewal playbook.
18 min read
Microsoft 365 Copilot CIO Playbook
Microsoft · Pillar
Microsoft 365 Copilot CIO Playbook
The Microsoft 365 Copilot CIO playbook.
22 min read
GenAI Knowledge Hub
GenAI · Hub
GenAI Knowledge Hub
The GenAI knowledge hub across Microsoft, OpenAI, Anthropic, Google.
14 min read
Editorial photograph

Your next renewal is an opportunity.

500+ enterprise clients. 11 vendor practices. Industry recognized. One conversation can change what you pay for the next three years.

Azure OpenAI intelligence, monthly.

Azure OpenAI framework signals, PTU framework signals, MACC framework signals, Microsoft EA framework signals, and the broader Microsoft AI framework leverage signals.