-
Notifications
You must be signed in to change notification settings - Fork 276
The behavior related to disruption budget is not clearly explained in the documentation #2218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Based on your requirements, here is an example of allowing Karpenter to drift one node at a time. Keep in mind this will not block other forms of disruption such as consolidation, expiration, or other forms of forceful disruption.
Here is the Karpenter documentation on configuring disruption budgets: https://karpenter.sh/docs/concepts/disruption/#nodepool-disruption-budgets. Another example of allowing all nodes to be disputed on Sunday but block disruption for all other days.
Just to give a little bit more guidance, let me elaborate each of the disruption budget you have defined and what they will configure Karpenter to perform. The budget below will allows Karpenter to drift half the nodes in the cluster all the time except for from 0:00 - 1:00 UTC, where all disruption will be blocked. From 01:00 - 23:59, 10% of empty nodes will be disrupted.
The following disruption budget will allow 1 node to be disrupted at a time from 0:00 - 08:00 UTC. For 08:00 - 23:59 UTC, 10% of nodes will be disrupted at any given time.
The budget below will allow Karpenter to drift half the nodes in the cluster at any given time, while 10% of empty nodes will be disrupted at any given time.
/remove-kind bug |
@engedaam: Those labels are not set on the issue: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/priority backlog |
@engedaam hi! Thanks for the detailed reply!
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: ops4366-drift-one-per-hour
spec:
amiFamily: AL2023
amiSelectorTerms:
- alias: al2023@v20250505
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
deleteOnTermination: true
encrypted: true
volumeSize: 50Gi
volumeType: gp3
kubelet:
maxPods: 1000
metadataOptions:
httpEndpoint: enabled
httpProtocolIPv6: disabled
httpPutResponseHopLimit: 1
httpTokens: required
role: eks-karpenter-node
securityGroupSelectorTerms:
- tags:
Name: eks-node
subnetSelectorTerms:
- tags:
Name: private-us-east-1a
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: ops4366-drift-one-per-hour
spec:
disruption:
budgets:
- duration: 1h
nodes: '1'
schedule: '@hourly'
consolidateAfter: 60s
consolidationPolicy: WhenEmpty
template:
metadata:
labels:
node-group: ops4366-drift-one-per-hour
spec:
expireAfter: Never
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: ops4366-drift-one-per-hour
requirements:
- key: karpenter.sh/capacity-type
operator: In
values:
- on-demand
- key: kubernetes.io/arch
operator: In
values:
- arm64
- key: karpenter.k8s.aws/instance-family
operator: In
values:
- t4g
- key: karpenter.k8s.aws/instance-size
operator: In
values:
- small
startupTaints:
- effect: NoExecute
key: node.cilium.io/agent-not-ready
value: 'true'
taints:
- effect: NoSchedule
key: node-group
value: ops4366-drift-one-per-hour
apiVersion: apps/v1
kind: Deployment
metadata:
name: drift-one-per-hour
spec:
replicas: 4
selector:
matchLabels:
app: drift-one-per-hour
template:
metadata:
labels:
app: drift-one-per-hour
spec:
tolerations:
- key: "node-group"
operator: "Equal"
value: "ops4366-drift-one-per-hour"
effect: "NoSchedule"
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-group
operator: In
values:
- ops4366-drift-one-per-hour
topologySpreadConstraints:
- maxSkew: 1
topologyKey: "kubernetes.io/hostname"
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: drift-one-per-hour
containers:
- name: tools
image: ghcr.io/obervinov/images/backup-tools:latest
command:
- sleep
- '604800'
resources:
requests:
memory: 25Mi
imagePullPolicy: Always
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 25%
maxSurge: 25% after that, I updated the alias version from amiSelectorTerms |
Uh oh!
There was an error while loading. Please reload this page.
Description
Hi!
I think this is more of a question than a bug, but I can't fully understand how the budget settings in NodePool work.
Observed Behavior:
Some more context: some time ago we started using EC2NodeClasses with
alias: al2023@latest
asamiSelectorTerm
. When a new AMIs is released, we receive a configurationDrift
and, as a result, the immediate replacement of all nodes in the corresponding NodePool (and the unavailability of some services running on these nodes, as almost all nodes are replaced immediately).I tried different ways to change this behavior through the
disruption.budgets
spec and extend the replacement of all nodes for several hoursbut in the end, I still get a situation where, when a Drift occurs, all nodes from the NodePool are replaced within 2-3 minutes.
Expected Behavior:
I would like to extend the replacement of all nodes in NodePool, for example, no more than one node per hour, or it would be ideal to allow replacement only on a certain day of the week.
I would be very grateful for any recommendations.
Reproduction Steps (Please include YAML):
Versions:
kubectl version
): v1.31.7-eks-bcf3d70The text was updated successfully, but these errors were encountered: