-
Notifications
You must be signed in to change notification settings - Fork 348
cohort resources are getting over borrowed when multiple cluster queues are borrowing resources #5289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
cc @gabesaba ptal |
Hi @alaypatel07, this quota defined at Cohort level is additive: so there is a total of 8CPU/8Gi available in the entire Cohort |
I will update the docs to make this more clear - this is not the first time a user expected this semantic. @alaypatel07, were there any docs in particular which gave you the impression that the resources defined at Cohort level worked in this way? |
@gabesaba I was reading from this doc https://kueue.sigs.k8s.io/docs/concepts/cohort/#configuring-quotas, I dont see it being mentioned anywhere that quotas on cohorts are additive. Can you please be more clear on what additive means? If there are 4 cluster queues belonging to a cohort and the cohort defines nominalquota of 2 CPU, then in total there will be quota of 10 CPUs, 2 for each clusterqueue? |
In that case, there will just be 2CPU quota, assuming that the ClusterQueues do not define any quota. I just meant that the Resources defined at the Cohort level is independent of quotas at ClusterQueue. These numbers may be added up to determine total capacity. E.g.: Structure
Total Resources Available in Cohort
|
Ohh I see, I think I had a different mental model of the system. I assumed that quota needs to be defined at cohort once and then ClusterQueues can take smaller slices of resources from the quota at cohort level. This is clearly not true. Can you please help put this in documentation? I will be happy to review the doc PR. |
What happened:
What you expected to happen:
I expected the job in team-b to be in pending state because team-b doesnt have enough quota. Instead it was in running state.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
kubectl version
):$ k version Client Version: v1.32.3 Kustomize Version: v5.5.0 Server Version: v1.32.0
git describe --tags --dirty --always
):v0.11.4
cat /etc/os-release
):uname -a
):The text was updated successfully, but these errors were encountered: