Replies: 2 comments
-
We definitely plan to work on this, and have an early design #77, but this is an "aspirational long-term" goal currently in the roadmap. Thank you! Voices like this help us to prioritize.
AFAIK the resource allocation is static only ATM. cc @andrewsykim |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Is there support in Kueue for managing elastic jobs?
Imagine I have ML training GPU workloads that are not elastic as typical. But I also have GPU data workloads, which are elastic. Examples would be Ray jobs that can scale up and down.
Is there any support in Kueue, now or in the future, to manage a mix of such workloads under queue quotas and fairness?
Looking at the RayCluster documentation, it mentions:
How does Kueue manage the resource allocation? Does it scale clusters up and down? If so, does that take into account fairness?
Beta Was this translation helpful? Give feedback.
All reactions