Content
AWS spot instances are used by a number of cloud users as a key component of AWS cost optimization strategy. Spot instances are an effective tool to optimize cloud costs because they may be offered at 50-90% discount off on-demand price for the same instance type. However, they also come with a downside of termination risk. A proper understanding of how spot instances work can enable you to utilize them most effectively for a class of workloads in an effort to reduce cloud costs.
In this blog post we will look at:
- What Are AWS spot instances and why use them
- Use cases for spot instances
- How does AWS handle spot requests
- SpotQuakes and downsides of using spot
- Tools to effectively manage spot instance loss in Kubernetes
- Alternatives to spot instances
What Are AWS Spot Instances
- Spot Instances are a type of Amazon Elastic Compute Cloud (EC2) instance that utilizes spare capacity within the AWS cloud.
- These instances are available at significantly reduced rates compared to On-Demand prices.
- AWS offers this excess capacity to users, allowing them to leverage it for their workloads.
Why Use AWS Spot Instances
The primary reason teams opt for AWS Spot Instances is their cost-effectiveness. While spot instances can reduce your initial purchase price, they may also lead to higher expenses in other areas, as explained below:
- The primary reason your team might want to use spot instances is cost savings. By using spot instances, you can achieve substantial reductions in your EC2 costs—often up to 90% compared to On-Demand pricing.
- Since spot instances are priced based on the current supply and demand dynamics, they are an attractive option for cost-conscious organizations.
Key Differences between Spot vs On-Demand Instances
Good Spot Instance Use Cases
Spot instances are well-suited for workloads that are flexible in terms of timing and can tolerate interruptions. Here are some common use cases:
- Data Analysis: Spot instances are ideal for running large-scale data analysis jobs where you can take advantage of available capacity during off-peak hours.
- Batch Processing: Background tasks, batch jobs, and other non-time-sensitive workloads can benefit from Spot Instances.
- Optional Tasks: If you have tasks that can be interrupted without causing critical issues, spot instances are a good fit.
How Does AWS Handle Spot Instance Requests
Spot instances function with a very simple principle and AWS has stated how it works very clearly. To use spot instances, you create a spot instance request that includes the desired number of instances, the instance type, and the availability zone. If capacity is available, Amazon EC2 fulfills your request immediately. Otherwise, Amazon EC2 waits until your request can be fulfilled or until you cancel the request.
The following illustration shows how spot instance requests work.
Notice that the request type (one-time or persistent) determines whether the request is opened again when Amazon EC2 interrupts a spot instance or if you stop a spot instance. If the request is persistent, the request is opened again after your spot instance is interrupted. If you have a persistent request and stop your spot instance, the request will only reopen once you restart your spot instance. Essentially, you place a bid, and if it’s accepted, you receive the resource. If it’s rejected, you continue to wait until you decide to cancel the request.
SpotQuakes and the Downside of using AWS Spot Instances
A few years back we touched on Spot Quakes, our affectionate name for the event in which you lose all your spot instances in one go; things get a little bumpy. When a spot instance is about to be taken away you receive an event, this event is your 2-minute warning; shutdown and evacuate. This inherent characteristic means Spot Instances are better suited to workloads that can handle interruptions gracefully.
In addition, the following aspects of Spot instance should be observed:
- Price Volatility
- Spot Instance prices fluctuate based on supply and demand. While they are generally much lower than On-Demand prices, sudden spikes can occur.
- If the price increases significantly, your Spot Instances may be terminated automatically. It’s essential to monitor prices and set appropriate bid prices.
- Bid Strategy
- When launching Spot Instances, you need to specify a bid price. If your bid is below the current market price, your instances may be terminated.
- Choosing the right bid strategy (e.g., bidding at the On-Demand price or slightly above) is crucial to avoid frequent interruptions.
- Workload Compatibility
- Not all workloads are suitable for Spot Instances. Real-time applications, databases, and mission-critical services may not tolerate interruptions.
- Analyze your workload’s characteristics and determine if it aligns with Spot Instance behavior.
- Capacity Availability
- While Spot Instances are usually available, there might be times when capacity is scarce due to high demand.
- If your workload relies heavily on Spot Instances, consider diversifying across multiple instance types or regions.
- Stateful Workloads
- Stateful applications (those that maintain internal state or data) may face challenges with Spot Instances.
- If an instance gets terminated, any unsaved data could be lost. Ensure your application handles state appropriately.
How to Manage Variability or Potential Loss of AWS Spot Instances
Initially AWS offered Spot Blocks or Spot Instances with a defined duration, unfortunately these are no longer. AWS deprecated and eventually EOL’d EC2 Spot Blocks & Spot Instances with a defined duration In July 2021 by AWS.
Update July 2021 – Spot Instances with a defined duration (also known as Spot blocks) are no longer available to new customers as of July 1, 2021. For customers that have previously used the feature, we will continue to support Spot Instances with a defined duration until December 31, 2022. If your workload is interruption tolerant, we recommend that you use Spot Instances without setting a defined duration. If your workload is not interruption tolerant we recommend that you use On-Demand instances for the required duration of your workload. For the most up to date information please see our documentation here.
This means you need to build or implement a third party solution to automate the bidding to help combat price fluctuations and orchestrate your infrastructure. Below are some common options:
Karpenter
- Karpenter’s primary responsibility is to provision compute capacity for your Kubernetes clusters.
- You define a NodePool configuration in Karpenter, specifying instance types, Availability Zones, and capacity types.
- Karpenter can handle Spot Instance interruptions gracefully.
- Starting from version 0.19.3, Karpenter project recommends using Karpenter’s native interruption handling, rather than a standalone Node Termination Handler1.
Spot Ocean
- It automatically handles instance interruptions, ensuring your workloads meet service level agreements (SLAs).
kOps Toolbox Instance Selector
- If you’re using kOps to manage your Kubernetes clusters, consider leveraging the kOps Toolbox Instance Selector.
- This tool simplifies the creation of Instance Group configurations adhering to spot instances best practices.
- It allows flexibility in choosing instance types and optimizes allocation strategies for efficient spot usage.
What’s the Best Alternative to Using AWS Spot Instances
If you’re wondering what else can be done then you are not alone. AWS, and specifically EKS can create a massive amount of hard to remove waste. Sysdig and Datadog have the data to prove it, check out the 2023 report here.
EMP by Platform9 provides an effective alternative to using AWS spot instances for cloud cost optimization when running Kubernetes in the public clouds. EMP optimizes wasted Kubernetes resources by reclaiming 70% or more of your Kubernetes CPU or Memory waste. Rather than provisioning new capacity for new workloads, EMP reuses currently wasted Kubernetes capacity to provision them. Using this model, you can reduce 70% or more of your Kubernetes costs, without ever having to use spot instances and handling spot terminations. In other words, your workload SLA remains intact with EMP.
You can scan an existing Kubernetes cluster to immediately see how much of your wasted capacity you can reclaim with EMP, thereby reducing the need to use spot instances to save costs. Learn more here.