Home/Azure/Azure OpenAI

Azure · OpenAI Service

Azure OpenAI is a new line. It already has audit shape.

Azure OpenAI Service is the fastest growing line in most enterprise Azure consumption profiles. The pricing mechanic is unfamiliar. The capacity reservation mechanic does not behave like Reserved Instances. The model selection decisions carry per token cost differences of an order of magnitude. The contract language is in flux because Microsoft is iterating the offering quarter by quarter. Most enterprises are paying meaningfully more than the workload requires because the platform team is procuring at pay as you go rates and the licensing team is not in the conversation. Azure OpenAI is the line where the largest contract concessions of 2026 are being signed.

The pricing mechanic

How Azure OpenAI actually prices.

Azure OpenAI bills per token for input and output against the selected model. Different models carry different per token rates separated by an order of magnitude. The same workload can land at one cost on a flagship model and one fifteenth of that cost on a smaller model. The model selection decision is the largest single lever on the Azure OpenAI line.

Model 01

Pay as you go

The pay as you go tier

Pay as you go bills per million input tokens and per million output tokens by model. Output tokens cost meaningfully more than input tokens on every model in the catalog. The pay as you go tier is the right answer for variable workloads, early stage pilots, and applications without predictable traffic patterns.

Best for. Pilots, intermittent traffic, applications under active development.
Watch. The pay as you go rate is the highest unit cost. Sustained workloads belong on Provisioned Throughput.
Lever. Model selection is the single largest variable. Right model for the task drops cost by an order of magnitude.

Model 02

PTU

Provisioned Throughput Units

PTUs reserve dedicated capacity for a chosen model. The buyer commits monthly or annually to a unit count. Each PTU delivers a defined throughput at a flat rate. The per token economics are materially better than pay as you go for sustained traffic. PTUs are the right answer for production workloads with predictable demand patterns.

Best for. Production applications with consistent throughput requirements, latency sensitive use cases.
Watch. PTUs reserve a specific model. Model selection lock in is real and the catalog moves quickly.
Lever. Annual PTU commitments produce meaningful discount over monthly. The commit decision needs forecast confidence.

The model selection

Why model selection is the largest lever.

The Azure OpenAI catalog includes a tier of flagship reasoning models at premium prices and a tier of smaller models at one tenth to one twentieth the cost. Most enterprise workloads run on the most expensive model in the catalog because the platform team defaulted there during the pilot phase and never revisited the decision. Right sizing the model selection per workload is the cleanest cost optimization on the line.

Tier 01

Flagship

Flagship models

Flagship reasoning models for complex multi step tasks. The right answer where the output quality determines the outcome and the cost per call is acceptable in context. Wrong answer for high volume bulk operations.

Tier 02

Mid tier

Mid tier models

Mid tier models for general purpose tasks where the workload does not require flagship reasoning. The cost per token runs one third to one fifth of flagship pricing. Most enterprise workloads belong here once the use case is properly characterized.

Tier 03

Small models

Small models for high volume bulk operations such as classification, extraction, and routing. The cost per token runs one tenth to one twentieth of flagship pricing. The output quality is sufficient for many embedded use cases.

The PTU strategy

How PTU commitments should be sized.

PTUs reserve capacity for a specific model. The reservation is meaningful because Azure OpenAI capacity in popular models is constrained in many regions. PTU access is both an economic and an availability lever. The sizing decision must balance the discount economics against the model lock in risk in a catalog that evolves quarterly.

Lever 01

Right sizing

Sizing the PTU footprint

The PTU count should match the sustained throughput floor of the workload, not the peak. Peaks remain on pay as you go at the burst rate. The PTU layer captures the discount on the predictable baseline and pay as you go absorbs the variability above. The structure is identical to the RI plus Savings Plan logic on standard compute.

The sizing decision interlocks with the broader EA renewal and MACC negotiation. PTU commitments draw down the MACC at the contracted rate.

Lever 02

Term selection

Monthly versus annual PTU

Annual PTU commitments produce meaningful discount over monthly. The discount is real and the lock in is the model selection. The right answer depends on the confidence in the model choice across a twelve month horizon. Workloads on flagship models with twelve month confidence belong on annual. Workloads where the model selection may shift within twelve months belong on monthly.

The contract language matters. Negotiate the right to convert annual PTU between models in the catalog mid term as Microsoft releases new model generations. Without the language the buyer is locked to the model on the day of commitment.

The contract surface

Where Azure OpenAI moves at negotiation.

Azure OpenAI is on the negotiation table at every enterprise renewal touching Azure in 2026. Microsoft is willing to commit on PTU discounting, MACC eligibility expansion for OpenAI consumption, model conversion rights mid term, and data residency language. The buyer who arrives prepared on the line captures concessions that the buyer who does not arrive prepared simply leaves.

Lever 01

PTU discount

PTU pricing concessions

Microsoft is actively concession willing on PTU pricing for enterprise commitments. Annual PTU commitments sized into the MACC envelope produce discount beyond the published rate card. The negotiation requires a defensible forecast and an explicit framing of the PTU commitment as part of the broader Azure commitment posture.

Lever 02

Conversion rights

Model conversion rights

The most important contract language on the Azure OpenAI line. The right to convert annual PTU commitments between models in the catalog mid term as Microsoft releases new generations. Without the language the buyer is locked. With the language the PTU commitment becomes durable through the model evolution that will continue at the current pace for the foreseeable future. The conversion right should also extend to regional capacity reallocation so the buyer can move PTU footprint as Microsoft expands availability in priority regions through the term.

The advisory work

What we deliver on Azure OpenAI.

The Azure OpenAI engagement is a workload model selection audit, a PTU versus pay as you go decision per workload, a PTU sizing analysis, MACC integration, and contract language that protects the position through the rapid model catalog evolution.

Deliverable 01

The model selection audit

We inventory the existing Azure OpenAI workloads. We reconcile the model selection per workload against the actual task requirements. We identify the workloads sitting on flagship models that should run on mid tier or small models. The output is a model selection optimization that frequently cuts the Azure OpenAI line by half without changing the user experience.

Deliverable 02

The PTU portfolio and contract

We model the PTU layer against the sustained throughput floor per workload and design the layered PTU plus pay as you go structure. We integrate the PTU commitment into the MACC sizing. We negotiate the PTU pricing concessions and the model conversion rights language that protects the position through the catalog evolution. The output is an Azure OpenAI line that prices the consumption defensibly through term.

Engage the practice

Negotiate Azure OpenAI while concession is willing.

The Azure OpenAI diagnostic surfaces the model selection optimization, the PTU sizing decision, the MACC integration, and the contract language that protects the position through model evolution. The result is a meaningfully cheaper line and a contract that survives the next two model generations.