Home/Azure/Spot Instances Strategy
Cost Optimization · Spot Capacity

Spot capacity is ninety percent cheaper. The catch is in the eviction notice.

Azure spot virtual machines sell surplus capacity at up to ninety percent off pay as you go, with one condition. Azure can reclaim the capacity at thirty seconds notice when it needs the hardware back, or when the market price rises above your bid. For the right workloads this is the cheapest compute in the catalog. For the wrong ones it is a reliability incident waiting to happen. The strategy is not whether to use spot. It is building the architecture that makes interruption a non event, then placing only the workloads that survive eviction onto it.

Contact Us Azure pillar →
The economics

The discount is real. So is the recall.

Spot pricing is set by available surplus capacity in each region and VM family. Discounts run from roughly sixty percent up to ninety percent against the pay as you go rate, and the discount moves with regional supply. Two settings govern the deal: the eviction type and the maximum price you are willing to pay.

Mechanic 01
Eviction policy

Capacity recall and price

Azure evicts a spot VM for one of two reasons: it needs the capacity back for pay as you go demand, or the market price rises above your configured maximum. You choose whether eviction deallocates the machine for later restart or deletes it outright.

  • Deallocate. The machine stops and the disk persists. Cheaper to resume but you continue paying for storage.
  • Delete. The machine and its resources are removed. Right for stateless pool members managed by an orchestrator.
Mechanic 02
Maximum price

The bid ceiling

You set the maximum hourly price you will pay. Set it to minus one and you pay the current spot price and are only evicted on capacity recall. Set a fixed cap and you are also evicted whenever the market exceeds it, trading more interruption for a predictable ceiling.

  • Price minus one. Pay the floating spot rate, capacity eviction only. The common default.
  • Fixed cap. Predictable maximum, more frequent eviction in tight regions.
The fit test

Does the workload survive eviction?

The single question that determines spot suitability is whether the workload can lose a node at thirty seconds notice without data loss or a failed result. Three workload classes pass the test cleanly. Anything that does not pass should not be on spot regardless of how attractive the discount looks.

Fit 01

Batch and parallel jobs

Rendering, simulation, encoding, large scale data processing. Work that splits into independent units where a lost node simply reschedules its unit elsewhere. This is the canonical spot workload and frequently the single largest source of spot savings in an estate.

Fit 02

Stateless scale out tiers

Web and application tiers behind a load balancer where capacity is fungible. A mixed scale set with on demand base instances and spot surge instances delivers elasticity at a fraction of the cost, with the on demand base absorbing the floor if spot evaporates.

Fit 03

Non production environments

Development, test, and continuous integration agents. The cost of an occasional interruption is a rerun, not an incident. Spot backed build agents and ephemeral test clusters cut non production compute dramatically with no production risk.

The architecture

Make interruption a non event.

Spot is safe when the architecture expects eviction rather than hoping to avoid it. Three patterns convert spot from a gamble into a dependable cost lever, and all three should be in place before any meaningful workload moves onto spot capacity.

Pattern 01

Honor the eviction signal

Azure issues a scheduled events notification before eviction. The workload should listen for it, drain in flight work, checkpoint state, and deregister cleanly inside the notice window. A workload that ignores the signal loses whatever was in memory. A workload that honors it loses nothing.

Pattern 02

Spread across families and zones

Eviction correlates within a single VM family and zone. A spot pool that diversifies across families and availability zones is far less likely to lose all capacity at once. Orchestrators that select from multiple eligible families keep the pool full as individual SKUs tighten.

Pattern 03

On demand base layer

For anything that must hold a floor, run a mixed scale set with an on demand base and a spot surge. The base guarantees minimum capacity that cannot be evicted, while the spot tier supplies the cheap elastic headroom above it. The floor never disappears.

Pattern 04

Checkpoint long jobs

Long running batch work should checkpoint progress to durable storage at intervals. On eviction the job resumes from the last checkpoint on a fresh node rather than restarting from zero. Checkpointing is what makes multi hour jobs viable on capacity that can vanish.

The portfolio fit

Spot sits outside the commitment.

Spot is the elasticity tranche of the wider Azure commitment portfolio. It does not displace reserved instances or savings plans. It complements them by carrying the interruptible work that should never have been committed in the first place.

Layer 01

Reserved core

The stable production baseline. Deepest discount on workloads that run continuously and cannot tolerate interruption. Spot has no place here.

Layer 02

Savings plan shell

The moving middle that holds a spend floor but shifts shape. Covered by the flexible commitment, not by spot.

Layer 03

Spot tranche

The interruptible elastic band. Batch, surge, and non production. The deepest discount in the catalog for the work that can afford to lose a node.

The spot capacity playbook.

The eviction and pricing mechanics, the three class fit test, the four architecture patterns that make interruption safe, and the portfolio placement model that keeps spot outside the commitment tranches. Sent on request.

$420M+ recovered · 340+ engagements
Engage the practice

Place the workload before you chase the discount.

Spot is the cheapest compute Azure sells and the easiest to misuse. The savings only hold when the architecture expects eviction and only the right workloads sit on the capacity. We map the eligible work, validate the eviction handling, and build the mixed model that captures the discount without putting a single production commitment at risk.

Contact Us 79% audit exposure cut · 20+ years practice depth