EMP

Karpenter vs Cluster Autoscaler vs EMP

A popular tool by DevOps engineers for managing EKS cluster autoscaling is the open source Karpenter autoscaler. Created by AWS, Karpenter has more capabilities than simple “0 to N” autoscaling of Kubernetes worker nodes. In fact, at Platform9 we’ve sometimes heard people especially enamored of it say Karpenter solves cost efficiency and scaling problems in EKS altogether.  However, while Karpenter is a valuable tool and one with some obvious advantages over the default cluster autoscaler that comes out of the box for EKS, it may not be the best-fit solution for all use cases.

In this post, we will compare Karpenter autoscaler with EKS default cluster autoscaler and discuss pros and cons. We will then compare Karpenter with EMP and how EMP handles Kubernetes cost optimization by real-time auto-sizing of Kubernetes resources.

Kubernetes cluster autoscaler and its shortcomings

The Kubernetes cluster autoscaler is designed to scale clusters dynamically to deal with increased or decreased deployment of workloads in a DevOps-friendly way. The concept is cluster autoscaler works on a simple concept. If there are no nodes with enough resources to run pods waiting to schedule, cluster autoscaler scales up one or more cloud-provider-managed node groups. If there is a large amount of wasted capacity, CAS scales down node groups.  Cluster autoscaler also works to incorporate advance scheduling controls like priority, disruption budgets, node selectors etc, to find best fit node for a pod.

Scaling limited to one node type

This model addresses the problem of too little (or too much) cluster for the size of the work to be done in a basic way – but cluster autoscaler will only scale up to create more nodes using a single AWS node type that it is configured with. This means if you have one small pending pod in a cluster built on an autoscaling node group that is configured to use a relatively large AWS node size, cluster autoscaler will go ahead and create a new large node to run that tiny pod if it can’t find any existing nodes to run it on.

No node consolidation

Likewise, if you have large nodes and one is mostly empty but the others have enough spare capacity between them to run that node’s pods, cluster autoscaler will do nothing.  This potentially still leaves you spending money on resources of that node that you never use.

Node group creation overhead

In addition to the above, cluster autoscaler requires the scaling node groups to be created in the cloud provider, outside of CAS, and then set up in its configuration. This increases the administrative burden of handling diverse workloads. For example ones that require extra-large nodes, nodes with GPUs, or nodes with different CPU architectures need to be created and managed.

How Karpenter works and its advantages

Karpenter’s model is a simple but effective refinement over that used by Kubernetes cluster autoscaler.

Better bin-packing

Unlike cluster autoscaler which only works with a single node size, Karpenter will provision nodes of varying sizes as needed. When there is a new pending pod request, if it can’t be scheduled on one of the existing nodes due to lack of sufficient capacity, Karpenter will create a new node of the necessary type that are just large enough (maybe even larger than the existing nodes, if needed) to overcome the deficiency and schedule the pod on it. This results in better bin-packing overall of pods across nodes, where there is less waste of capacity due to pods running on nodes with overhead capacity left unused.

Consolidation

This behavior, although beneficial in better bin-packing, has a couple of potential side effects. You can end up with a lot more nodes as you let your cluster auto-scale, and they can be of different sizes. To compensate for that, and to handle cases where overall node utilization may be low due to fragmentation of pods across nodes, Karpenter also tries to remove, consolidate, and replace nodes so that the same workloads run at a lower cost – either in fewer nodes of a given size or in nodes of smaller sizes.  Naturally, this is liable to incur some pod disruption, so like cluster autoscaler, Karpenter tries to respect controls like pod disruption budgets and avoids removing or replacing nodes if disruption budgets would be violated. 

Simulation to avoid extra node creation

Karpenter also tries to avoid adding nodes where that won’t actually allow any unscheduled pods to schedule, by running a scheduling simulation prior to provisioning. For example, if affinity rules on the unscheduled pods in the scheduling queue require them to run on specific existing nodes that don’t have enough resources to run them, Karpenter will not provision new nodes because the unscheduled pods still wouldn’t be able to run even if it did so.

No need to pre-create node groups

Karpenter will provision nodes of varying size as needed, without needing to create node groups in the cloud provider first (though you can still use provider-managed node groups alongside Karpenter if you want to). This gets rid of the management overhead around creating and managing node groups of different node sizes, compared to cluster autoscaler.

Better handling of spot instances

Karpenter also has a feature that makes it a lot more powerful than cluster autoscaler. Unlike CAS, it can directly monitor and react to AWS spot instance termination notices. So when a spot instance is about to be reclaimed, Karpenter will go ahead and add capacity in advance if needed to allow pods on that node to reschedule immediately. This begins the process of draining the node right away to allow the maximum margin for clean termination. Karpenter thus enables you to take advantage of spot instance savings for workloads that can handle spot disruptions in a much more effective way than cluster autoscaler.

Common Karpenter limitations

Right at the outset of this section it’s worth noting that while Karpenter currently supports only AWS EKS, cluster Autoscaler supports dozens of cloud providers, so if you’re not using AWS you can’t really benefit from Karpenter yet – but Karpenter is a lot newer than cluster autoscaler, so this could change in the future.

One issue that trips new Karpenter users up pretty regularly is that (like CAS) Karpenter looks for un-schedulable pods that could be scheduled, does the “scheduling math” to determine what to provision to make them actually schedulable, and then adds that capacity to the cluster, but it doesn’t do the actual workload scheduling.

This can lead to situations where more new capacity is allocated than pods end up actually being scheduled to, so one or more new nodes end up getting disrupted right back away without ever running workloads.  Even though this compensation usually happens pretty quickly, so the actual extra expense incurred is relatively minimal, it can be disconcerting since the point of Karpenter is to reduce exactly that kind of overprovisioning waste.

Karpenter also, and really through no fault of its own, can conflict with other autoscaling components – for example, the Karpenter docs warn that although Karpenter can coexist with the cluster autoscaler, the AWS node termination handler often deployed alongside cluster autoscaler to gracefully handle spot instance terminations can cause problems and should be removed, “due to conflicts that could occur from the two components handling the same events”.  If you just replaced CAS with Karpenter, forgot (or didn’t know) to remove the Node Termination Handler, and ran into issues, you might think Karpenter is behaving badly, but really neither component is doing anything wrong – it’s just a consequence of two things reacting to the same events in ways that interfere with each other.

Lastly, by design, Karpenter is a very capable cluster autoscaler – more so than the standard Kubernetes cluster autoscaler in several important respects. But it’s not more than that. And nor does it claim to be if you read its documentation. Most importantly, it’s not a pod scaler. It will neither try to control the number of replicas of your deployments in the cluster like the Horizontal Pod Autoscaler, nor will it try to manipulate the resource configuration of those workloads like the Vertical Pod Autoscaler. If pod resource requests are poorly configured and some pods end up unschedulable even though the cluster is significantly under-utilized, Karpenter will still add nodes with enough available resources, based on the amount of those requests, to allow them to schedule and run. Karpenter will not however reconfigure the unschedulable pods themselves so that you can reduce wasted allocation and run some or all of those pods in your existing cluster footprint in the first place. Nor will it try to optimize resources allocated to existing pods via requests using some other mechanism. The Kubernetes resource-management problem, in a nutshell, is not just about the size or number of nodes in the cluster, it’s also about the pods running on those nodes and whether the amount of resources allocated to them via requests is well-matched to their needs.

How EMP bridges the workload efficiency gap

If Karpenter can’t make your workloads themselves more efficient, how do you reduce the biggest source of waste in your cluster – the unused resources allocated to those workloads?  You don’t have to stop using Karpenter – but you do need something besides just that one very cool tool.  Workload optimizers like the Vertical Pod Autoscaler exist, but as with Karpenter’s node consolidation, fully benefiting from them incurs pod disruption (and will still do so, to some degree, even if the recently-added alpha-release feature for in-place pod resizing becomes generally available in the future).

For a more comprehensive solution to workload resource inefficiency, Platform9’s Elastic Machine Pool uses a fundamentally different model than single-purpose autoscalers to manage and optimize total cluster utilization: EMP creates a virtualization layer on AWS Metal instances, to then, in turn, create and manage Elastic VM (EVM) nodes that are added to your EKS clusters.  EMP can then provide seamless optimization through two foundational virtualization technologies that address fundamental resource-management gaps in Kubernetes itself:

  • VM resource overcommitment lets EMP place EVMs with allocated, but unused, resources together on AWS Metal nodes, without the unused amount of resource requests taking actual resources away from workloads that need them.
  • VM live migration allows EMP to migrate pods between physical nodes as needed (by migrating the EVMs they run in), without restarting the pods as would normally be needed if this were handled within Kubernetes natively.  This means EVMs can be rebalanced to different AWS Metal instances as the overall amount of infrastructure is scaled up or down to suit the changing needs of the workloads in the cluster, without the pod disruption that would normally accompany that kind of rebalancing.

Together, these allow EMP to increase the overall utilization of the EKS cluster, to an extent that solutions based on managing Kubernetes workloads and nodes alone can’t.  EMP in effect becomes a cost-optimization layer below the EKS cluster, reducing EKS spend without EKS itself being involved in (or even aware of) EMP doing so.

Of course, if you’ve got some particular need to run specific workloads on EC2 instances, or just want to take your time migrating some or all of your workloads from using Karpenter to using EMP, you can: EMP will not interfere with Karpenter’s management of the EC2 nodes it controls, and Karpenter likewise won’t cause issues with EMP’s EVMs.  You can even easily enable EMP only for pods in chosen namespaces.

If you’re interested in getting started with EMP, let us know!  Signup is easy, and so is getting started with your first Elastic Machine Pool.

Additional reading and reference

  1. Introduction to EMP
  2. How EMP works behind the scenes
  3. FinOps Landscape

Previously in this series:

Kubernetes requests and limits – behavior and impact on workloads

Documentation

  1. EMP Overview
  2. Cluster Autoscaler
  3. Karpenter
Scroll to Top