How is Azure OpenAI Service priced?

Azure OpenAI Service is priced on token consumption per model (per 1,000 or per million input and output tokens) on pay as you go, or through Provisioned Throughput Units (PTUs) that reserve capacity for a fixed hourly or monthly rate. PTUs suit steady high volume; pay as you go suits variable demand. The model choice and throughput mode drive the bill.

What are Provisioned Throughput Units (PTUs) in Azure OpenAI?

PTUs are reserved capacity blocks that guarantee throughput and latency for a model at a fixed price, replacing per token billing for committed workloads. They lower unit cost at steady high volume but waste money if utilization is low. Size PTUs to measured demand, not to peak forecasts.

How does Azure OpenAI fit into an EA or MACC?

Azure OpenAI consumption draws down your Microsoft Azure Consumption Commitment (MACC) and can be negotiated within the Enterprise Agreement. Folding AI spend into the MACC can earn better rates but risks an inflated commit. Keep AI demand forecasts conservative when sizing the commitment.

How do you control Azure OpenAI cost?

Control Azure OpenAI cost by routing routine tasks to cheaper models, capping token usage per workload, monitoring with Azure Cost Management, and choosing PTUs only where utilization justifies them. Most overspend comes from over capable model selection and uncapped usage. Governance recovers more than rate negotiation.

When should you negotiate Azure OpenAI pricing?

Negotiate Azure OpenAI pricing alongside your EA or MACC renewal, after a usage baseline of 60 to 90 days. That window lets you forecast token and PTU demand credibly. Committing to AI volume before usage is proven locks in the vendor's growth assumptions.

Azure OpenAI Pricing 2026: $2.50 Per 1M Token Math

Guide · Microsoft · Azure OpenAI

Azure OpenAI is a token, throughput, and commitment framework. Read the guide.

Key takeaways

GPT-4o on Azure lists at $2.50 per 1M input tokens and $10 per 1M output tokens on the Standard consumption meter. Model choice moves cost more than any discount.
Provisioned Throughput (PTU) beats Standard only above a utilization breakeven. Most estates buy it too early; Microsoft's own PTU guidance assumes steady load.
Compare against OpenAI direct list pricing before every commitment. The Azure premium buys MACC drawdown and enterprise terms, not cheaper tokens.
MACC credits burn on Azure OpenAI. That makes the service a negotiation asset at EA and MACC renewal, not just a line item.

Azure OpenAI Service is the Microsoft commercial wrapper around the OpenAI model family. Same GPT-4o, o1, o3, embeddings, DALL·E 3, Whisper, and TTS models that OpenAI sells direct, but consumed under Azure commercial terms inside the customer's Azure tenant.

The pricing model has three axes: token economics by model, throughput strategy (Standard pay as you go versus Provisioned Throughput Units), and commitment structure (PAYG, PTU monthly, PTU annual rolled into MACC drawdown).

This guide covers the published token rates, the PTU breakeven math, the Standard versus Provisioned decision framework, the Microsoft EA and MACC roll up mechanics, and the 11 move buyer side playbook that delivers 25 to 40 percent against the unoptimized Azure OpenAI baseline.

Read the related Microsoft services practice, the GenAI vendors practice, and the Microsoft EA renewal playbook.

Quick answer

Azure OpenAI Service enterprise pricing. GPT-4o $2.50/$10 per 1M tokens, PTU breakeven, Standard vs Provisioned, MACC drawdown, 11 buyer side moves.

What Azure OpenAI Service actually is

Azure OpenAI runs the same OpenAI model family as ChatGPT Enterprise and the OpenAI API, with three operational differences that matter to enterprise customers.

Tenant boundary. The compute and data planes sit in the customer's Azure tenant, which means Azure regional data residency, Azure Active Directory integration, and the broader Azure security envelope apply.
Commercial wrapper. Billing flows through the Microsoft EA or Microsoft Customer Agreement, which means Azure OpenAI consumption counts toward MACC commitments, can be optimized against MACC, and rolls up to the EA renewal cycle.
Governance stack. Azure governance tools including Azure Monitor, Microsoft Purview, Azure Policy, and Azure Cost Management apply, which materially simplifies enterprise governance compared to OpenAI direct.

The buyer side question on every Azure OpenAI deployment is whether to run Azure OpenAI or OpenAI direct. The answer depends on data residency requirements, existing Microsoft commercial position, and governance posture. Most regulated enterprises default to Azure OpenAI for data residency and governance reasons.

Most technology companies without regulated workloads default to OpenAI direct for lower friction model access. Read the related CIO playbook for negotiating OpenAI contracts.

Token economics: published rates by model

Azure OpenAI token pricing, January 2026

Model	Input per 1M tokens	Output per 1M tokens	Cached input
GPT-4o	$2.50	$10.00	$1.25
GPT-4o mini	$0.15	$0.60	$0.075
o1	$15.00	$60.00	$7.50
o1 mini	$3.00	$12.00	$1.50
o3	$20.00	$80.00	$10.00
GPT-3.5 Turbo	$0.50	$1.50	N/A

Source: Microsoft Azure OpenAI Service pricing page, January 2026. Batch API delivers 50 percent discount against Standard rates with 24 hour completion window. Cached input pricing applies to repeated prefix tokens within 5 minute windows.

4 token control levers compound:

Prompt caching. Delivers 50 percent discount on repeated prefix tokens, which is significant for RAG workloads with stable system prompts.
Batch API. Delivers 50 percent discount on async workloads but adds 24 hour latency, making it suitable for overnight processing not interactive use cases.
Model selection routing. GPT-4o mini for classification and routing tasks, GPT-4o for content generation, o1 family only for deep reasoning workflows. Mixed deployment typically delivers 60 to 75 percent reduction against all GPT-4o or all o1 deployment.
Context window engineering. Shorter prompts and tighter context retrieval reduce input token consumption proportionally.

Provisioned Throughput Units (PTU)

PTU reframes Azure OpenAI from consumption based pricing to dedicated capacity. A PTU represents a defined throughput floor (tokens per minute, varying by model). Customer commits to a PTU count, pays a fixed monthly fee per PTU, and consumes within the provisioned capacity without per token billing.

PTU is sold across three deployment types: Provisioned Managed (single region, no failover), Provisioned Global (multi region failover for resilience), and Data Zone Provisioned (regional data residency commitments for regulated workloads).

Minimum commitments vary by model: typically 50 PTU floor for GPT-4o, 25 PTU floor for GPT-4o mini, with the floor adjusted by Azure region availability. Annual reservations deliver approximately 30 percent discount versus monthly PTU.

PTU annual reservations roll into Microsoft Azure Consumption Commitment (MACC) drawdown, which makes them count toward the broader EA commercial position. The PTU governance discipline is to size PTU at 10 to 20 percent above measured peak throughput, with Provisioned Global handling spillover at PAYG rates.

Standard versus Provisioned breakeven

The Standard versus Provisioned decision turns on utilization economics. Standard wins below the breakeven point. Provisioned wins above. The breakeven typically sits at 50 to 70 percent Standard utilization measured against the PTU monthly cost.

Below 50 percent utilization, the Provisioned commitment is wasted capacity. Above 70 percent utilization, the Provisioned commitment delivers material savings against equivalent Standard consumption plus eliminates token rate exposure to model price changes during the term.

The practical implementation is hybrid: Provisioned for the production workload baseline, Standard for development, testing, and burst capacity beyond Provisioned headroom. Provisioned Global resilience workloads add a separate Provisioned capacity in a second region for failover. The Standard versus Provisioned framework refreshes quarterly against actual measured throughput.

Azure OpenAI inside the Microsoft EA

Azure OpenAI consumption rolls up into Azure spend, which rolls into MACC drawdown, which rolls into the Microsoft EA. Four control points matter.

MACC absorption. Azure OpenAI counts toward MACC commitment, which means a customer with an existing $5M annual MACC commitment can absorb Azure OpenAI consumption without incremental commitment.
Renewal window. The EA renewal is the right negotiation window for Azure OpenAI pricing concessions; mid term Azure OpenAI commercial discussions deliver materially less leverage.
PTU into MACC drawdown. PTU annual reservations convert to MACC drawdown, which makes Provisioned commitment more attractive at the EA level than at the standalone Azure OpenAI level.
Copilot integration. Microsoft 365 Copilot consumes Azure OpenAI under the hood; a customer running both Copilot and direct Azure OpenAI should manage them as a single commercial position.

Read the related Microsoft EA renewal playbook and the Microsoft Azure MACC negotiation.

Azure OpenAI governance

5 governance control points apply at scale:

Model approval. A defined internal committee approves which models are available to which workloads. Not every workload needs o1 or o3 access.
Data residency. Enforced through Azure region selection and Data Zone Provisioned deployments where regulatory requirements demand it.
Content filtering. The Azure OpenAI content filter framework covering hate, violence, self harm, and sexual content with configurable severity thresholds.
Audit logging. Integrated with Azure Monitor for technical logs and Microsoft Purview for compliance audit trails.
Cost allocation. Applied through the 6 tag policy described in the Azure FinOps cost governance framework.

11 move buyer side playbook

Measure actual peak throughput from production traffic, not forecast. Forecasts inflate by 40 to 80 percent on AI workloads.
Model the Standard vs Provisioned breakeven against measured utilization. Re run quarterly.
Size PTU at 10 to 20 percent above peak, not at peak. Provisioned Global absorbs spillover at PAYG rates.
Enable prompt caching on RAG workloads. 50 percent discount on repeated prefix tokens.
Use Batch API for async workloads. 50 percent discount with 24 hour latency.
Build the model selection routing layer. GPT-4o mini for routing, GPT-4o for content, o1 family for reasoning only.
Convert PTU annual reservations into MACC drawdown. Roll Azure OpenAI commitment into the broader EA position.
Negotiate Azure OpenAI pricing inside the EA renewal cycle. Mid term discussions deliver materially less leverage.
Manage Microsoft 365 Copilot and Azure OpenAI as a single commercial position. Both consume the same underlying OpenAI models.
Apply Data Zone Provisioned for regulated workloads. Regional data residency commitment from Microsoft.
Tag and allocate Azure OpenAI consumption to business units. Apply chargeback to drive behavioral change on prompt engineering and model selection.

How we engage

Azure OpenAI assessment. 6 week scoping covering actual throughput measurement, Standard versus Provisioned breakeven, PTU sizing model, prompt caching opportunity, and Batch API workload identification. GenAI Vendors Practice.
Microsoft EA renewal with Azure OpenAI integration. 9 month managed renewal sequence with Azure OpenAI negotiated as part of the broader EA position. Renewal Program.
Vendor Shield for Microsoft and GenAI. Continuous advisory across the Microsoft estate and the GenAI vendor mix including OpenAI, Anthropic, Google. Vendor Shield.
GenAI commercial benchmarking. Azure OpenAI pricing benchmarked against OpenAI direct, Anthropic Claude Enterprise, Google Gemini Enterprise. Benchmarking Practice.

Deep Library

Standard (consumption)

Pay per token · zero floor
Spiky or unproven workloads
GPT-4o $2.50 in / $10 out per 1M

Provisioned (PTU)

Reserved throughput · monthly floor
Steady production load only
Wins above the utilization breakeven

The Standard versus PTU decision is a utilization bet. Below the breakeven, reserved capacity is idle money; above it, consumption pricing is a tax on success.

$2.50

Per 1M GPT-4o input tokens, Standard

PTU minimum commitment, GPT-4o

20-40%

Typical PTU utilization by month three

Source: Redress Compliance advisory engagement file, 2024 to 2025.

Where the common advice on Azure OpenAI pricing is wrong

The standard guidance says lock in PTU capacity early because provisioned is the enterprise grade option and consumption is for pilots. We disagree, and the engagement data is why: reserved throughput bought before a workload has three months of measured production traffic ran at 20 to 40 percent utilization in most estates we reviewed, which means the buyer paid a committed monthly floor for capacity that consumption pricing would have covered at a fraction of the cost, and the right sequence is almost always Standard first, measure, then convert only the proven steady state load to PTU at renewal, when the commitment also becomes a MACC and EA negotiation asset.

Vera AI · 30 day free trial

Is your price above or below the market? Vera knows.

520 vendor benchmarks, from Microsoft EA to Oracle ULA to Salesforce
Instant percentile standing: market low, median, and high for deals like yours
Renewal uplift exposure modeled over the full term, with the cap to ask for

Start the free Vera AI trial →30 days free · no credit card · cancel anytime

Vendor Advisory

Cloud & Emerging

Programs

Advisory Services

Assessments

Research

Knowledge Hubs

Tool Hubs