Azure spot virtual machines sell surplus capacity at up to ninety percent off pay as you go, with one condition. Azure can reclaim the capacity at thirty seconds notice when it needs the hardware back, or when the market price rises above your bid. For the right workloads this is the cheapest compute in the catalog. For the wrong ones it is a reliability incident waiting to happen. The strategy is not whether to use spot. It is building the architecture that makes interruption a non event, then placing only the workloads that survive eviction onto it.
Spot pricing is set by available surplus capacity in each region and VM family. Discounts run from roughly sixty percent up to ninety percent against the pay as you go rate, and the discount moves with regional supply. Two settings govern the deal: the eviction type and the maximum price you are willing to pay.
Azure evicts a spot VM for one of two reasons: it needs the capacity back for pay as you go demand, or the market price rises above your configured maximum. You choose whether eviction deallocates the machine for later restart or deletes it outright.
You set the maximum hourly price you will pay. Set it to minus one and you pay the current spot price and are only evicted on capacity recall. Set a fixed cap and you are also evicted whenever the market exceeds it, trading more interruption for a predictable ceiling.
The single question that determines spot suitability is whether the workload can lose a node at thirty seconds notice without data loss or a failed result. Three workload classes pass the test cleanly. Anything that does not pass should not be on spot regardless of how attractive the discount looks.
Rendering, simulation, encoding, large scale data processing. Work that splits into independent units where a lost node simply reschedules its unit elsewhere. This is the canonical spot workload and frequently the single largest source of spot savings in an estate.
Web and application tiers behind a load balancer where capacity is fungible. A mixed scale set with on demand base instances and spot surge instances delivers elasticity at a fraction of the cost, with the on demand base absorbing the floor if spot evaporates.
Development, test, and continuous integration agents. The cost of an occasional interruption is a rerun, not an incident. Spot backed build agents and ephemeral test clusters cut non production compute dramatically with no production risk.
Spot is safe when the architecture expects eviction rather than hoping to avoid it. Three patterns convert spot from a gamble into a dependable cost lever, and all three should be in place before any meaningful workload moves onto spot capacity.
Azure issues a scheduled events notification before eviction. The workload should listen for it, drain in flight work, checkpoint state, and deregister cleanly inside the notice window. A workload that ignores the signal loses whatever was in memory. A workload that honors it loses nothing.
Eviction correlates within a single VM family and zone. A spot pool that diversifies across families and availability zones is far less likely to lose all capacity at once. Orchestrators that select from multiple eligible families keep the pool full as individual SKUs tighten.
For anything that must hold a floor, run a mixed scale set with an on demand base and a spot surge. The base guarantees minimum capacity that cannot be evicted, while the spot tier supplies the cheap elastic headroom above it. The floor never disappears.
Long running batch work should checkpoint progress to durable storage at intervals. On eviction the job resumes from the last checkpoint on a fresh node rather than restarting from zero. Checkpointing is what makes multi hour jobs viable on capacity that can vanish.
Spot is the elasticity tranche of the wider Azure commitment portfolio. It does not displace reserved instances or savings plans. It complements them by carrying the interruptible work that should never have been committed in the first place.
The stable production baseline. Deepest discount on workloads that run continuously and cannot tolerate interruption. Spot has no place here.
The moving middle that holds a spend floor but shifts shape. Covered by the flexible commitment, not by spot.
The interruptible elastic band. Batch, surge, and non production. The deepest discount in the catalog for the work that can afford to lose a node.
The eviction and pricing mechanics, the three class fit test, the four architecture patterns that make interruption safe, and the portfolio placement model that keeps spot outside the commitment tranches. Sent on request.
Spot is the cheapest compute Azure sells and the easiest to misuse. The savings only hold when the architecture expects eviction and only the right workloads sit on the capacity. We map the eligible work, validate the eviction handling, and build the mixed model that captures the discount without putting a single production commitment at risk.