Pod Disruption Budget and How Best To Use It

Content

Importance of HA in Kubernetes
Kubernetes Pod Disruption Budgets
How to Use Pod Disruption Budgets
When should you use the Pod Disruption Budget – Use Cases
How the PDB Impacts Scheduling Decisions of Kubernetes
Best Practices for Implementing Pod Disruption Budgets
Common Pitfalls When Implementing Pod Disruption Budgets
Monitoring Pod Disruption Budgets (PDBs) in Kubernetes
Conclusion
Also Read

The resilience of your application deployed on Kubernetes is governed by two key components. First is using pod replicas to ensure high availability and second is configuring a Pod Disruption Budget (PDB). Kubernetes HA, achieved via creating multiple replicas of the same pod, keeps the applications and services running and accessible during updates, failures, maintenance, or planned disruptions. While pod disruption budgets ensure that you have control over how many replicas of a given highly available application can go down during a planned downtime event. During node maintenance or rolling updates, you can use PDBs to limit the number of pods that can be deleted. Further, PDBs cover the minimum number of pods for application and service availability.

Together, they help make your applications resilient and highly available. In this post, we’ll examine Pod Disruption Budgets closely.

Importance of HA in Kubernetes

As mentioned above, High Availability (HA) is one of the performance strategies that ensures the availability of applications and services. HA is achieved by replicating the control plane, distributing workloads across different nodes across different availability zones, and providing load balancing and failover. It maintains continuous service even during infrastructure issues and version upgrades.

Continuous Availability: During failures of individual components such as nodes or other infrastructure, HA ensures service availability by distributing workloads across multiple nodes and zones.
Fault tolerance: Kubernetes can be recovered from failure and fault-tolerant with HA by automatic replication and rescheduling of pods. Critical components like etcd, API servers, and controllers are replicated across multiple nodes, making the cluster fault-tolerant.
Load Balancing: The internal traffic can be distributed across healthy nodes and pods with the load balancer, preventing nodes from overloading. It also ensures that external traffic is routed to the appropriate pod.
Scaling and Flexibility: Application dynamic scaling allows the system to handle changes in demand by automatically adding or removing nodes. Furthermore, distributing the workloads across multiple availability zones provides additional resilience.

Kubernetes Pod Disruption Budgets

PDBs are key features in Kubernetes that help ensure the stability and high availability of applications during voluntary disruptions. An administrator can define constraints on how many pods can be disrupted simultaneously with PDBs. This ensures that a minimum number of pods are available to serve traffic and ensure critical tasks.

Need for PDBs

Kubernetes clusters often undergo updates, maintenance, or scaling operations. These actions may require disruptions that can affect pods. That’s where PDBs prevent the disruptions of all the pods and service availability by specifying the number of pods that can be safely disrupted at a time. PDBs also ensure that applications continue to run smoothly even during such activities. The detailed key features of PDBs are as follows.

Ensure High Availability: PDBs ensure the minimum number of pods availability even during scheduled disruptions such as version upgrades or node maintenance. PDBs prevent pod unavailability by enforcing the threshold for the number of pods.
Prevents excessive downtime: Voluntary disruptions like updates or maintenance could inadvertently delete many pods and may affect service availability. PDBs prevent this situation by simply ensuring the minimum number of pods are available.
Smooth updates: During rolling updates, PDBs ensure that the updates are done incrementally without causing interruptions. Pod disruption limit enforcement allows smooth updates.
Resource utilization: With pod disruption limits, PDBs ensure efficient resource management. PDBs prevent the loss of available pods even during node disruptions, and the cluster can handle the load.

How do PDBs work?

Pod disruption budgets control the availability of pods during voluntary disruptions. They ensure that the specified number of pods is always available. The key components of PDBs are as follows.

MinAvailable: It defines the minimum number of pods that should remain available during disruptions.
MaxUnavailable: It defines the maximum number of pods that can be unavailable at any given time during voluntary disruptions like version upgrades or scaling.
Selector: It specifies which pods the PDBs are applied to. This is based on labels, ensuring the right set of pods is always protected.

How PDBs Manage Pod Availability During Maintenance

Draining Nodes: Draining of nodes is one of the activities that affects the availability of services. During this, pods must be moved from one node to another. This is when PDB ensures that at least the minimum number of pods specified with “MinAvailable” in the policy remains available during this process. If the number of pods drops below the threshold, the node drain operation will continue after the required number of pods are available.
Updating Deployment: There are different strategies for updating deployments. A widely used strategy is rolling updates, during which Kubernetes updates pods in batches to avoid downtime. PDBs control the number of pods that can be disrupted at any given time. For example, if a “MaxUnavailable” setting is applied, Kubernetes will ensure that no more pods are unavailable than the number specified.

YAML Definition of a Simple PDB

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: example-pdb
spec:
  minAvailable: 3
  #maxUnavailable: 2
  selector:
	matchLabels:
  	app: my-app

In the above example, minAvailable ensures that at least 2 pods with the label ‘app=my-app’ remain available during any disruption. The maxUnavailable number also limits the number of pods to no more than 1 at any given time. The selector specifies which pods the PDB targets. Let’s name this file as webapp-deployment.yaml.

How to Use Pod Disruption Budgets

A step-by-step guide to creating PDBs in Kubernetes.

Identify the Target Application:

As explained above, PDBs are typically used for applications that should always maintain their availability, such as databases or web servers. Determine the type of application that needs protection against disruption.

Define Disruption Tolerance:

Define the acceptable number of pods that can be disrupted safely without affecting the performance of the application/service. Use the above YAML definition file as a reference and apply the policy with –

“`kubectl apply -f webapp-deployment.yaml“`

You should see following output for successful PDB creation

“`poddisruptionbudget.policy/example-pdb created“`

Simulate and Test PDB settings
Deploy the application

Deploy the simple application with 5 replicas. Use the following deployment YAML file.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 5
  selector:
	matchLabels:
  	app: my-app
  template:
	metadata:
  	labels:
    	app: my-app
	spec:
  	containers:
  	- name: nginx
    	image: nginx:alpine3.19

Apply this deployment:

kubectl apply -f my-app.yaml

deployment.apps/my-app created should show up on the screen as output.

Simulate the node drain

Cordon the Node.

Cordoning the node prevents new pods from scheduling on it

kubectl cordon

Above command will show the following output.

node/<node-name> cordoned

Drain the Node

kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data

Draining the node will show a warning message as follows.

Warning: ignoring DaemonSet-managed Pods: cattle-system/cattle-node-agent-4xqj5, ingress-nginx/nginx-ingress-controller-gtxnt, kube-system/canal-k9j69
node/<node-name> drained

Since we have defined PDB, the drain operation will ensure the minAvailable setting is respected. With PDB in place, the node would wait for at least 3 pods to be available before draining. If the minimum 3 pods are absent, Kubernetes will pause the eviction process and wait for new pods to be created/scheduled on the other nodes.

Test rolling updates:

It is also possible to test the PDB settings with rolling updates. At this stage, we will test it with rolling updates.

Update the image version

kubectl edit deployment my-app

Update the image version from nginx:alpine3.19 to nginx:latest

Successful update will show deployment.apps/my-app edited on the screen.

Monitor rolling updates:
During the update, Kubernetes will follow the rules defined in the PDB and ensure that 3 of the pods remain available while new pods are being created. A progress can be monitor using:

kubectl rollout status deployment/webapp

When should you use the Pod Disruption Budget – Use Cases

Below are the key use cases where an application’s pod availability and functionality are critical during node drains and rolling updates.

Databases

Continuous PostgreSQL or MySQL data availability is crucial for any cluster or database. Ensuring the minimum number of running pods during node disruptions or rolling updates can be achieved with PDBs. For example, in a MySQL cluster with 3 replicas, minAvailable: 2 ensures at least 2 database pods are functioning during maintenance. This prevents downtime for read/write operations and maintains the service continuity. Without PDB, draining of nodes during maintenance could evict multiple pods, potentially causing downtime and data inconsistency.

Messaging systems:

Messaging systems like RabbitMQ or Kafka use multiple brokers to distribute messages. PDBs ensure sufficient brokers are operational to handle the message traffic during maintenance. Ensuring the minimum number of brokers during maintenance reduces the risk of message loss and smooth processing. Without PDB, all or multiple brokers could be evicted, leading to processing delays or data loss.

Stateful Applications

Stateful applications such as Elasticsearch need a stable number of pods for data replication and query handling. Ensuring a minimum number of pods with PDB guarantees that pods remain operational during node drains or maintenance. This helps ensure the search query can still be processed and prevents data replication delays. Without a PDB, eviction of too many nodes could lead to slow recovery time and inability to search queries, negatively affecting performance.

CI/CD Pipelines

Jenkins and GitLab CI-like pipelines need continuous build agent availability to avoid delays in running the job. PDBs ensure that a few of the build agents are available during updates. For example, in a Jenkins deployment with 50 agents, setting maxUnavailable: 10 allows 10 agents to be disrupted at any given time, ensuring that most build agents remain available to handle build tasks. Without a PDB, disruption of many build agents could cause build failures or a long queue of waiting jobs, delaying software delivery.

Monitoring Systems:

Monitoring tools like Prometheus must maintain uptime to monitor cluster health metrics and trigger alerts. A PDB ensures that monitoring stays active during maintenance or any other event that can cause disruption. Setting minAvailable: 2 guarantees that a minimum of 2 Prometheus pods remain available at any given time, ensuring that metrics are collected and alerts are sent. Without a PDB, if monitoring pods get disrupted simultaneously, issues like gaps in metrics, alerts, and undetected system problems may go undetected.

How the PDB Impacts Scheduling Decisions of Kubernetes

Pod disruption budgets ensure application availability, which affects the scheduling and eviction processes in Kubernetes. Details of its impacts are discussed below.

Pod Eviction Control

As per policy, PDBs prevent Kubernetes from evicting a certain number of pods. The policy ensures that a minimum number of pods are always available.

Eviction Blocking: PDBs block evictions if the disruption causes the number of available pods to drop below the `minAvailable` threshold. For example, if the PDB policy is set to `minAvailable: 2` and only 2 pods are running, any further requests for eviction will be denied.
Graceful Eviction: Kubernetes evicts the pods when they comply with PDB policy. It pauses the eviction process until additional pods are running and further eviction can be gracefully done.

Scheduling Constraints During Node Maintenance

PDB policies influence the draining and scheduling processes during node maintenance. If any pods violate the PDB policy, the node drain process will be delayed until enough healthy pods are available on other nodes.

Node Drain Delays: The node draining process stops if it causes the pod availability to drop below the PDB’s `minAvailable`. This causes a delay until conditions are met.
Pod Allocation Decisions: Kubernetes considers the PDB policy before scheduling pods. It ensures that minimum availability is not violated when pods are moved to new nodes.

Pod Prioritization on Updates:

The deployment controller respects the PDB policy during its rolling updates and ensures that it is not violated.

Update Sequence: PDBs decide how pods are updated. Pods will be updated incrementally to ensure that the maxUnavailable and minAvailable requirements are met during the update process.
Deployment controller integration: The deployment controller integrates with PDB and ensures that pods are not evicted too quickly. The controller also ensures that the services remain available during the update process.

Cost Optimization: In addition to ensuring controlled evictions and maintenance, PDB also ensures the efficient use of resources and cost optimization.

Controlled scaling: When scaling down workloads, PDBs ensure that the scaling process does not reduce the number of running pods beyond the limit predefined in the PDB.
Efficient Resource Utilization: PDBs ensure resource efficiency by preventing too many disruptions. Pods remain available without allocating more resources for backup capacity. This helps optimize cost.

Best Practices for Implementing Pod Disruption Budgets

General Best Practices

Align PDBs with Application SLA: If the service provider and client agree to 99.9% uptime, configure the PDBs to guarantee it even during cluster/node maintenance and updates.
Use PDBs with Higher-Level Concepts: Higher-level concepts are referred to the StatefulSets, Deployments, and DaemonSets tools of Kubernetes. Using PDB with these tools ensures that PDB rules for handling pod disruptions are in sync with these, and the application runs smoothly even during maintenance and updates.
Use Labels and Selectors Consistently: Using consistent labeling across pods and services ensures that PDBs target the correct pods and avoid unintended disruptions.
Start with Conservative Values: It is important to be careful while setting the values for minAvailable and maxUnavailable in PDB. Once you understand how the application behaves during disruptions, you can fine-tune the values.

Advanced PDB Strategies

Dynamic Adjustment of PDBs Based on Traffic Patterns or Load: For some applications or services, the traffic pattern is not consistent. Hence, during heavy traffic hours, the higher minAvailable parameter will ensure higher availability, and during low traffic, the lower number will ensure fewer pods and faster updates/maintenance. These parameters can dynamically be adjusted with Kubernetes Horizontal Pod Autoscaler (HPA) based on real-time traffic patterns.
Integrate PDBs with Autoscaling Policies: Configure the PDBs to avoid conflicts between scaling operations and availability requirements if the cluster uses horizontal or vertical autoscaling. This ensures smooth maintenance and scaling operations.
Combine PDBs with Custom Pod Priorities: The pods essential for the application’s operation can be given critical priority and ensure that they are less likely to be evicted or stopped. PDB with these priority pods can protect essential services, while the less critical pods may be replaced/paused when necessary.

By following the above-mentioned best practices, PDBs ensure optimal availability, performance, and resource efficiency according to the specific needs of the workloads.

Common Pitfalls When Implementing Pod Disruption Budgets

Though PDBs are powerful tools for maintaining availability, improper configuration can lead to many common issues. Following are some of the pitfalls.

Node drain delays due to strict PDB settings

Higher `minAvailable` numbers can prevent Kubernetes from draining nodes during upgrades, scaling, or maintenance due to inability to schedule pods on other nodes. This leads to delay in applying critical updates or node repairs until `minAvailalble` pods are running.

PDB Violations During Autoscaling

In strong PDBs, the limits for how many pods must stay available (minAvailable) or unavailable (maxUnavailable) may get violated, and PDB can block scaling operations. This slows down the autoscaling process to take care of requirements defined in PDB. Once the requirement is met, the scaling process continues.

Conflicts with Pod Priority and Preemption

Strict PDB on lower-priority pods can disrupt the update or maintenance process. While Kubernetes pod priority and preemption are being used along with the strict PDB on lower-priority pods conflicts may arise during update or maintenance process. Aligning pod priority classes with PDB settings can ensure the critical pods are prioritized correctly and avoid strict PDB for low-priority pods.

PDBs Blocking Critical Operations

PDB can prevent Kubernetes from deleting the pods if the node experiences failure. Strict PDBs can block important operations like scaling events, updates, or even emergency maintenance. This may lead to delays or potential downtime in addressing critical issues.

Knowing these pitfalls helps avoid disruptions and optimize the performance of clusters while using PDB policies.

Monitoring Pod Disruption Budgets (PDBs) in Kubernetes

It is important to monitor if PDBs are working as intended and if pod/workload availability is maintained. Some tools can be used to monitor the PDB effectively.

Kubernetes dashboard:
A web-based UI of the Kubernetes Dashboard can be used to monitor and manage the clusters. The UI allows for the observability of different kinds of workloads and PDBs. This can be accessed to,
1. View the current status of PDBs and how PDB policies affect pod availability.
2. Monitor disruptions and know if the PDB constraints are being violated.
3. Get the logs related to pod eviction and disruption events and troubleshoot.

kubectl:

`kubectl` can be used to manage the workloads, services and PDBs directly from the terminal. With kubectl you can get the details of PDB in the following way,

Run the `kubectl get pdb -n <pdb-name>` to see how many disruptions are blocked or allowed.
To get detailed information about a PDB, including current status and constraints, run the following –

kubectl describe pdb -n <namespace> <pdb-name>

Ongoing events and disruptions can be monitored with `kubectl get events`

kube-state-metrics:

`kube-state-metrics` service generates metrics of Kubernetes objects, including PDBs. It provides valuable information about the health and performance of your clusters. You can deploy kube-state-metrics in clusters to collect metrics on PDBs.

Prometheus & Grafana:

Using Prometheus and Grafana integration, interactive dashboards can be created. With Prometheus, you can scrape metrics from `kube-state-metrics` and monitor PDB-related metrics. Further integration with Grafana can be useful for visualizing these metrics. Alerts can also be configured for violations and disruptions below the threshold.

Conclusion

Pod Disruption Budgets (PDBs) play a very important role in the stability and high availability of the Kubernetes cluster by controlling the number of pod disruptions during maintenance, updates, and involuntary node drains. PDBs’ ability to maintain availability while efficiently utilizing resources makes them an essential tool for stateless and stateful applications. Incorporating minAvailable and maxUnavailable values with PDB is the best practice for building resilient and scalable infrastructure.

Pod Disruption Budgets and When To Use Them