Skip to content

Improve resource manager observability #3197

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
5 tasks
sukunrt opened this issue Feb 17, 2025 · 0 comments
Open
5 tasks

Improve resource manager observability #3197

sukunrt opened this issue Feb 17, 2025 · 0 comments

Comments

@sukunrt
Copy link
Member

sukunrt commented Feb 17, 2025

We should provide more inspection tools for users to make it easier for users to inspect resource manager usage. I have 3 specific items in mind.

  • Provide a way to print resource usage: Add a way to print resource limits and current consumption #3193
  • Add prometheus metrics for the configured limits so these can be added to dashboards for convenient checking of the upper bound.
  • Log the configured rcmgr limits on startup
  • Improve the documentation for Limits
    • What's a PartialLimitConfig
    • What's a ScalingLimitConfig
    • What's a default LimitVal which depends on context
    • Move a lot of the README stuff to the actual objects so godoc can pick it up
    • Add many more examples for common things people want
  • Simplify the API for adjusting the default resources allocated. By default, an 1/8 of the resources are allocated. It should be easy enough to adjust this multiplier.

cc @2color

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant