EMP

Karpenter vs Cluster Autoscaler vs EMP

A popular tool by DevOps engineers for managing EKS cluster autoscaling is the open source Karpenter autoscaler. Created by AWS, Karpenter has more capabilities than simple “0 to N” autoscaling of Kubernetes worker nodes. In fact, at Platform9 we’ve sometimes heard people especially enamored of it say Karpenter solves cost efficiency and scaling problems in EKS altogether.  However, while Karpenter is a valuable tool and one with some obvious advantages over the default cluster autoscaler, it may not be the best-fit solution for all use cases.

In this post, we will compare Karpenter autoscaler with EKS default cluster autoscaler and discuss pros and cons. We will then compare Karpenter with EMP and how EMP handles Kubernetes cost optimization by real-time auto-sizing of Kubernetes resources.

Kubernetes cluster autoscaler and its shortcomings

The Kubernetes cluster autoscaler scales clusters dynamically to deal with increased or decreased deployment of workloads in a DevOps-friendly way. If there are no nodes with enough resources to run pods waiting to schedule, cluster autoscaler scales up one or more cloud-provider-managed node groups. If there is a large amount of wasted capacity, CAS scales down node groups. Cluster autoscaler also works to incorporate advance scheduling controls like priority, disruption budgets, node selectors etc, to find best fit node for a pod.

Scaling limited to one node type

A fundamental limitation of the cluster autoscaler is that it will only scale up to create more nodes using a consistent AWS node shape and size. While you can use multiple instance types, they must all have the same CPU and memory size. This means if you have one small pending pod in a cluster built on an autoscaling node group that is configured to use a relatively large node size, cluster autoscaler will still create a new large node to run that tiny pod if it can’t find any existing nodes to run it on.

No node consolidation

Likewise, if you have large nodes and one is mostly empty but the others have enough spare capacity to run all of that node’s pods, cluster autoscaler will do nothing. This may leaves you spending money on resources of that node that you will never use.

Node group creation overhead

In addition to the above, cluster autoscaler requires the scaling node groups to be created in the cloud provider, outside of CAS, and then set up in its configuration. This increases the administrative burden of handling diverse workloads. For example, ones that require extra-large nodes, nodes with GPUs, or nodes with different CPU architectures need to be created and managed.

How Karpenter works and its advantages

Karpenter’s model is a simple but effective refinement over that used by Kubernetes cluster autoscaler.

Better bin-packing

A key feature of Karpenter is that unlike cluster autoscaler which only works with nodes that must adhere to same shape and size, Karpenter nodepools can provision nodes of varying sizes as needed. When there is a new pending pod request, if Karpenter can not schedule it on an existing node due to lack of sufficient capacity, it will create a new node that is just large enough fit the new pod. This results in better bin-packing overall of pods across nodes, where there is less waste of capacity due to pods running on nodes with overhead capacity left unused.

Consolidation

A side-effect of Karpenter’s just-in-time node provisioning is that you can end up with a lot more nodes as you let your cluster auto-scale, and they can be of different sizes. As your pods later scale down, you may be left in a fragmented state with a lot of nodes left unused. To compensate for that, Karpenter also tries to remove, consolidate, and replace nodes so that the same workloads run at a lower cost.  Naturally, this is liable to incur some pod disruption, so like cluster autoscaler, Karpenter tries to respect controls like pod disruption budgets and avoids removing or replacing nodes if disruption budgets would be violated. 

Simulation to avoid extra node creation

Karpenter also tries to avoid adding nodes where that won’t actually allow any unscheduled pods to schedule, by running a scheduling simulation prior to provisioning. For example, if affinity rules on the unscheduled pods in the scheduling queue require them to run on specific existing nodes that don’t have enough resources to run them, Karpenter will not provision new nodes because the unscheduled pods still wouldn’t be able to run even if it did so.

No need to pre-create node groups

Karpenter will provision nodes of varying size as needed, without needing to create node groups in the cloud provider first (though you can still use provider-managed node groups alongside Karpenter if you want to). This gets rid of the management overhead around creating and managing node groups of different node sizes, compared to cluster autoscaler.

Better handling of spot instances

Karpenter also has a feature that makes it a lot more powerful than cluster autoscaler. Unlike CAS, it can directly monitor and react to AWS spot instance termination notices. So when a spot instance is about to be reclaimed, Karpenter will go ahead and add capacity in advance if needed to allow pods on that node to reschedule immediately. This begins the process of draining the node right away to allow the maximum margin for clean termination. Karpenter thus enables you to take advantage of spot instance savings for workloads that can handle spot disruptions in a much more effective way than cluster autoscaler.

Common Karpenter limitations

While Karpenter currently supports only AWS EKS, cluster Autoscaler supports dozens of cloud providers, so if you’re not using AWS you can’t really benefit from Karpenter yet. However, Karpenter is a lot newer than the cluster autoscaler, so this could change in the future.

One issue that trips new Karpenter users is that (like CAS) Karpenter looks for un-schedulable pods that could be scheduled, does the “scheduling math” to determine new nodes needed to make the pods schedulable, and then adds that capacity to the cluster. But it does not do the actual workload scheduling.

This can lead to situations where more new capacity is allocated than pods end up actually being scheduled on. In this case, one or more new nodes end up getting disrupted right back away without ever running workloads.  Even though this compensation usually happens quickly, so the actual extra expense incurred is relatively low, it can be disconcerting since the point of Karpenter is to reduce exactly that kind of overprovisioning waste.

Karpenter also can conflict with other autoscaling components. For example, although Karpenter can coexist with the cluster autoscaler, the AWS node termination handler often deployed alongside cluster autoscaler to gracefully handle spot instance terminations can cause problems and should be removed. This is due to the conflicts that could occur from the two components handling the same events.  If you just replaced CAS with Karpenter, forgot to remove the Node Termination Handler and ran into issues, you might think Karpenter is behaving badly, but really neither component is doing anything wrong – it’s just a consequence of two things reacting to the same events in ways that interfere with each other.

Karpenter Vs Cluster Autoscaler

Here’s a quick comparison of Karpenter vs Cluster Autoscaler.

AspectKarpenterCluster Autoscaler
Resource EfficiencyAutomatically provisions nodes based on application needs, reducing overprovisioning and wasted resources.If not finely tuned, this may lead to overprovisioning, potentially resulting in wasted resources.
Scaling FlexibilityKarpenter uses JIT provisioning to create nodes only when they are needed, unlike the traditional auto scaler.Scaling is based on predefined metrics and policies, which may need more flexibility.
Cost OptimizationOptimizes resource utilization to reduce the number of nodes required, leading to lower cloud bills.Cost savings depend on the configuration and can be less optimized than Karpenter.
Node Size SpecificationAllows specifying a basket of different sizes for nodes, enabling selection of the most appropriate size from a range.Requires specifying one size per node group, which limits flexibility and may not always match the workload needs.
Node group managementDoes not require predefined node groups; dynamically provisions and manages nodes as needed using nodepoolsManages static node group, where each group is configured with a specific instance type size.
Automated patch management Provides built-in mechanisms for regular node patch and security update management through configuration of node drift and expirationNode patch and security update management is not automated and must be triggered manually
Automated Kubernetes version upgrade Karpenter’s drift feature also aids in more automated Kubernetes cluster version upgrades of your data plane nodes.Kubernetes version upgrades must be handled manually.
User-FriendlinessEasy to deploy using Helm and integrates seamlessly with Kubernetes.Setting up can be more challenging and requires more manual configuration.

One major drawback of Karpenter and Cluster Autoscaler

Both Karpenter and the default cluster autoscaler are both powerful autoscaling tools to satisfy you EKS cluster’s scaling needs. But, neither Karpenter nor Cluster autoscaler will try to optimize the resources allocated to your existing pods via requests. So if your pod resource requests are poorly configured and some pods end up unschedulable even though the cluster is significantly under-utilized, Karpenter will still add more nodes to the cluster to satisfy those requests. Karpenter will not reconfigure the unschedulable pods so that you can reduce wasted allocation and run some or all of those pods in your existing cluster. The Kubernetes resource-management problem, in a nutshell, is not just about the size or number of nodes in the cluster, it’s also about the pods running on those nodes and whether the amount of resources allocated to them via requests is well-matched to their needs.

How EMP bridges this gap

If your autoscaler can’t make your workloads themselves more efficient, how do you reduce the biggest source of waste in your cluster – the unused resources allocated to workloads?  You don’t have to stop using Karpenter or cluster autoscaler – but you do need something besides just these tools.  Workload optimizers like the Vertical Pod Autoscaler exist, but as with Karpenter’s node consolidation, fully benefiting from them incurs pod disruption. This will still be true to some degree, even if the recently-added alpha-release feature for in-place pod resizing becomes generally available in the future.

To address this workload inefficiency gap, Platform9’s Elastic Machine Pool uses a fundamentally different model, one that compliments your cluster autoscalers, and seamlessly optimizes total cluster utilization under the hood. EMP does this by creating an alternate layer of virtual machines for your EKS workers, called “Elastic VMs”, using AWS metal instances under the hood. the Elastic VMs are in turn added as nodes to your EKS clusters using your standard autoscaler. This in turn enables EMP to provide seamless optimization under the hood through two foundational virtualization technologies:

  • VM resource overcommitment lets EMP place EVMs with allocated, but unused, resources together on AWS Metal nodes, without the unused amount of resource requests taking actual resources away from workloads that need them.
  • VM live migration allows EMP to migrate pods between physical nodes as needed (by migrating the EVMs they run in), without restarting the pods.  This means EVMs can be rebalanced to different AWS Metal instances as the overall amount of infrastructure is scaled up or down to suit the changing needs of the workloads, without ever causing pod disruption.

Together, these allow EMP to increase the overall utilization of the EKS cluster, to an extent that solutions based on managing Kubernetes workloads and nodes alone can’t.  EMP in effect becomes a cost-optimization layer below the EKS cluster, reducing EKS spend without EKS itself being involved in or even aware of EMP doing so.

Of course, if you’ve got some particular need to run specific workloads on EC2 instances, or just want to take your time migrating some or all of your workloads from using Karpenter to using EMP, you can: EMP will not interfere with Karpenter’s management of the EC2 nodes it controls, and Karpenter likewise won’t cause issues with EMP’s EVMs.  You can even easily enable EMP only for pods in chosen namespaces.

If you’re interested in getting started with EMP, let us know!  Signup is easy, and so is getting started with your first Elastic Machine Pool.

Additional reading and reference

  1. Introduction to EMP
  2. How EMP works behind the scenes
  3. FinOps Landscape

Previously in this series:

Kubernetes requests and limits – behavior and impact on workloads

Documentation

  1. EMP Overview
  2. Cluster Autoscaler
  3. Karpenter
Scroll to Top