Skip to content

The behavior related to disruption budget is not clearly explained in the documentation #2218

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
obervinov opened this issue May 12, 2025 · 4 comments
Labels
kind/support Categorizes issue or PR as a support question. priority/backlog Higher priority than priority/awaiting-more-evidence. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@obervinov
Copy link

obervinov commented May 12, 2025

Description

Hi!
I think this is more of a question than a bug, but I can't fully understand how the budget settings in NodePool work.

Observed Behavior:
Some more context: some time ago we started using EC2NodeClasses with alias: al2023@latest as amiSelectorTerm. When a new AMIs is released, we receive a configuration Drift and, as a result, the immediate replacement of all nodes in the corresponding NodePool (and the unavailability of some services running on these nodes, as almost all nodes are replaced immediately).
I tried different ways to change this behavior through the disruption.budgets spec and extend the replacement of all nodes for several hours

      disruption:
        consolidationPolicy: WhenEmpty
        consolidateAfter: 60s
        budgets:
        - nodes: "50%"
          reasons:
            - Drifted
        - nodes: "0"
          schedule: "@daily"
          duration: 1h
      disruption:
        consolidationPolicy: WhenEmpty
        consolidateAfter: 60s
        budgets:
        - nodes: "1"
          schedule: "@daily"
          duration: 8h
      disruption:
        consolidationPolicy: WhenEmpty
        consolidateAfter: 60s
        budgets:
        - nodes: "50%"
          reasons:
            - Drifted

but in the end, I still get a situation where, when a Drift occurs, all nodes from the NodePool are replaced within 2-3 minutes.

Expected Behavior:
I would like to extend the replacement of all nodes in NodePool, for example, no more than one node per hour, or it would be ideal to allow replacement only on a certain day of the week.
I would be very grateful for any recommendations.

Reproduction Steps (Please include YAML):

---
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: nodepool-1
spec:
  disruption:
    budgets:
      - nodes: 50%
        reasons:
          - Drifted
      - duration: 1h
        nodes: '0'
        schedule: '@daily'
    consolidateAfter: 60s
    consolidationPolicy: WhenEmpty
  template:
    metadata:
      labels:
        node-group: nodepool-1
    spec:
      expireAfter: Never
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: nodeclass-1
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values:
            - on-demand
        - key: kubernetes.io/arch
          operator: In
          values:
            - arm64
        - key: karpenter.k8s.aws/instance-family
          operator: In
          values:
            - t4g
        - key: karpenter.k8s.aws/instance-size
          operator: In
          values:
            - medium
      startupTaints:
        - effect: NoExecute
          key: node.cilium.io/agent-not-ready
          value: 'true'
      taints:
        - effect: NoSchedule
          key: node-group
          value: nodepool-1

---
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: nodeclass-1
spec:
  amiFamily: AL2023
  amiSelectorTerms:
    - alias: al2023@latest
  role: eks-karpenter-node
  securityGroupSelectorTerms:
    - tags:
        Name: eks-node
  subnetSelectorTerms:
    - tags:
        Name: private-us-east-1a

Versions:

  • Chart Version: 1.3.3
  • Kubernetes Version (kubectl version): v1.31.7-eks-bcf3d70
  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@obervinov obervinov added the kind/bug Categorizes issue or PR as related to a bug. label May 12, 2025
@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority labels May 12, 2025
@engedaam
Copy link
Contributor

I would like to extend the replacement of all nodes in NodePool, for example, no more than one node per hour, or it would be ideal to allow replacement only on a certain day of the week.
I would be very grateful for any recommendations.

Based on your requirements, here is an example of allowing Karpenter to drift one node at a time. Keep in mind this will not block other forms of disruption such as consolidation, expiration, or other forms of forceful disruption.

disruption:
        consolidationPolicy: WhenEmpty
        consolidateAfter: 60s
        budgets:
        - nodes: "1"
          reasons:
            - Drifted
          schedule: "@hourly"
          duration: 1h

Here is the Karpenter documentation on configuring disruption budgets: https://karpenter.sh/docs/concepts/disruption/#nodepool-disruption-budgets. Another example of allowing all nodes to be disputed on Sunday but block disruption for all other days.

disruption:
        consolidationPolicy: WhenEmpty
        consolidateAfter: 60s
        budgets:
        - nodes: "100%"
          schedule: "0 0 * * 0"
          duration: 24h
        - nodes: "0"
          schedule: "0 0 * * 1"
          duration: 144h

Just to give a little bit more guidance, let me elaborate each of the disruption budget you have defined and what they will configure Karpenter to perform. The budget below will allows Karpenter to drift half the nodes in the cluster all the time except for from 0:00 - 1:00 UTC, where all disruption will be blocked. From 01:00 - 23:59, 10% of empty nodes will be disrupted.

disruption:
        consolidationPolicy: WhenEmpty
        consolidateAfter: 60s
        budgets:
        - nodes: "50%"
          reasons:
            - Drifted
        - nodes: "0"
          schedule: "@daily"
          duration: 1h

The following disruption budget will allow 1 node to be disrupted at a time from 0:00 - 08:00 UTC. For 08:00 - 23:59 UTC, 10% of nodes will be disrupted at any given time.

 disruption:
        consolidationPolicy: WhenEmpty
        consolidateAfter: 60s
        budgets:
        - nodes: "1"
          schedule: "@daily"
          duration: 8h

The budget below will allow Karpenter to drift half the nodes in the cluster at any given time, while 10% of empty nodes will be disrupted at any given time.

 disruption:
        consolidationPolicy: WhenEmpty
        consolidateAfter: 60s
        budgets:
        - nodes: "50%"
          reasons:
            - Drifted

/remove-kind bug
/kind support
/triage accepted
/remove-priority needs-priority

@k8s-ci-robot k8s-ci-robot added kind/support Categorizes issue or PR as a support question. triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed kind/bug Categorizes issue or PR as related to a bug. labels May 17, 2025
@k8s-ci-robot
Copy link
Contributor

@engedaam: Those labels are not set on the issue: priority/needs-priority

In response to this:

I would like to extend the replacement of all nodes in NodePool, for example, no more than one node per hour, or it would be ideal to allow replacement only on a certain day of the week.
I would be very grateful for any recommendations.

Based on your requirements, here is an example of allowing Karpenter to drift one node at a time. Keep in mind this will not block other forms of disruption such as consolidation, expiration, or other forms of forceful disruption.

disruption:
       consolidationPolicy: WhenEmpty
       consolidateAfter: 60s
       budgets:
       - nodes: "1"
         reasons:
           - Drifted
         schedule: "@hourly"
         duration: 1h

Here is the Karpenter documentation on configuring disruption budgets: https://karpenter.sh/docs/concepts/disruption/#nodepool-disruption-budgets. Another example of allowing all nodes to be disputed on Sunday but block disruption for all other days.

disruption:
       consolidationPolicy: WhenEmpty
       consolidateAfter: 60s
       budgets:
       - nodes: "100%"
         schedule: "0 0 * * 0"
         duration: 24h
       - nodes: "0"
         schedule: "0 0 * * 1"
         duration: 144h

Just to give a little bit more guidance, let me elaborate each of the disruption budget you have defined and what they will configure Karpenter to perform. The budget below will allows Karpenter to drift half the nodes in the cluster all the time except for from 0:00 - 1:00 UTC, where all disruption will be blocked. From 01:00 - 23:59, 10% of empty nodes will be disrupted.

disruption:
       consolidationPolicy: WhenEmpty
       consolidateAfter: 60s
       budgets:
       - nodes: "50%"
         reasons:
           - Drifted
       - nodes: "0"
         schedule: "@daily"
         duration: 1h

The following disruption budget will allow 1 node to be disrupted at a time from 0:00 - 08:00 UTC. For 08:00 - 23:59 UTC, 10% of nodes will be disrupted at any given time.

disruption:
       consolidationPolicy: WhenEmpty
       consolidateAfter: 60s
       budgets:
       - nodes: "1"
         schedule: "@daily"
         duration: 8h

The budget below will allow Karpenter to drift half the nodes in the cluster at any given time, while 10% of empty nodes will be disrupted at any given time.

disruption:
       consolidationPolicy: WhenEmpty
       consolidateAfter: 60s
       budgets:
       - nodes: "50%"
         reasons:
           - Drifted

/remove-kind bug
/kind support
/triage accepted
/remove-priority needs-priority

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot removed the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label May 17, 2025
@engedaam
Copy link
Contributor

/priority backlog

@k8s-ci-robot k8s-ci-robot added priority/backlog Higher priority than priority/awaiting-more-evidence. and removed needs-priority labels May 17, 2025
@obervinov
Copy link
Author

@engedaam hi! Thanks for the detailed reply!
I conducted the following experiment

  1. EC2NodeClass
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: ops4366-drift-one-per-hour
spec:
  amiFamily: AL2023
  amiSelectorTerms:
    - alias: al2023@v20250505
  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        deleteOnTermination: true
        encrypted: true
        volumeSize: 50Gi
        volumeType: gp3
  kubelet:
    maxPods: 1000
  metadataOptions:
    httpEndpoint: enabled
    httpProtocolIPv6: disabled
    httpPutResponseHopLimit: 1
    httpTokens: required
  role: eks-karpenter-node
  securityGroupSelectorTerms:
    - tags:
        Name: eks-node
  subnetSelectorTerms:
    - tags:
        Name: private-us-east-1a
  1. NodePool (just in case, I removed the explicitly specified reason)
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: ops4366-drift-one-per-hour
spec:
  disruption:
    budgets:
      - duration: 1h
        nodes: '1'
        schedule: '@hourly'
    consolidateAfter: 60s
    consolidationPolicy: WhenEmpty
  template:
    metadata:
      labels:
        node-group: ops4366-drift-one-per-hour
    spec:
      expireAfter: Never
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: ops4366-drift-one-per-hour
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values:
            - on-demand
        - key: kubernetes.io/arch
          operator: In
          values:
            - arm64
        - key: karpenter.k8s.aws/instance-family
          operator: In
          values:
            - t4g
        - key: karpenter.k8s.aws/instance-size
          operator: In
          values:
            - small
      startupTaints:
        - effect: NoExecute
          key: node.cilium.io/agent-not-ready
          value: 'true'
      taints:
        - effect: NoSchedule
          key: node-group
          value: ops4366-drift-one-per-hour
  1. I also added a deployment with 4 replicas, which should explicitly request 4 nodes from nodepool (one for each workload)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: drift-one-per-hour
spec:
  replicas: 4
  selector:
    matchLabels:
      app: drift-one-per-hour
  template:
    metadata:
      labels:
        app: drift-one-per-hour
    spec:
      tolerations:
        - key: "node-group"
          operator: "Equal"
          value: "ops4366-drift-one-per-hour"
          effect: "NoSchedule"
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: node-group
                    operator: In
                    values:
                      - ops4366-drift-one-per-hour
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: "kubernetes.io/hostname"
          whenUnsatisfiable: DoNotSchedule
          labelSelector:
            matchLabels:
              app: drift-one-per-hour
      containers:
        - name: tools
          image: ghcr.io/obervinov/images/backup-tools:latest
          command:
            - sleep
            - '604800'
          resources:
            requests:
              memory: 25Mi
          imagePullPolicy: Always
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 25%
      maxSurge: 25%

after that, I updated the alias version from amiSelectorTerms v20250514 -> v20250505 to try to reproduce the conditions of the Drift configuration occurrence. All 4 created NodeClaims received the AMIDrift status and all 4 NodeClaims were replaced within 5 minutes
Image

Log
karpenter-logs-amis-drift.json

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/support Categorizes issue or PR as a support question. priority/backlog Higher priority than priority/awaiting-more-evidence. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

3 participants