Skip to content

KATA-3270: pod overhead for peer pods reduce actual worker capacity #487

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 11, 2024

Conversation

balintTobik
Copy link
Contributor

@balintTobik balintTobik commented Dec 5, 2024

- Description of the problem which is fixed/What is the use case
Peer pods' overhead reduce actual worker capacity, however most of the RAM/CPU allocated in cloud instance.

Fixes: KATA-3270

- What I did

Operator create 'kata-remote' runtime class with a specific and lower overhead, like in upstream.

- How to verify it

  1. create KataConfig for a peer pods
  2. check 'kata-remote' runtime class created a lower overhead than 'kata' runtime class

- Description for the changelog
Use separate pod's overhead related constants for kata and peer pods.

[EDIT from Greg: adapted description to final version]

@openshift-ci-robot
Copy link

openshift-ci-robot commented Dec 5, 2024

@balintTobik: This pull request references KATA-3270 which is a valid jira issue.

In response to this:

- Description of the problem which is fixed/What is the use case
Peer pods' overhead reduce actual worker capacity, however most of the RAM/CPU allocated in cloud instance.

Fixes: KATA-3270

- What I did
Operator create 'kata-remote' runtime class without overhead.

- How to verify it

  1. create KataConfig for a peer pods
  2. check 'kata-remote' runtime class created without overhead,
    'kata' runtime class still created with required overhead

- Description for the changelog
Remove peer pods' overhead related constants and create peerpod's runtime without overhead

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Dec 5, 2024
@openshift-ci openshift-ci bot requested review from bpradipt and snir911 December 5, 2024 12:30
@openshift-ci openshift-ci bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Dec 5, 2024
Copy link

openshift-ci bot commented Dec 5, 2024

Hi @balintTobik. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@jensfr jensfr requested a review from c3d December 6, 2024 14:28
Copy link
Member

@gkurz gkurz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @balintTobik !

Thanks for working on that. I've commented and hopefully clarified what we want to achieved for now, i.e. use different overhead values for kata and kata-remote.

Also all commits should have a message and a valid Signed-off-by: tag.

Please address the comments in your local repo and force push to this PR.

@balintTobik
Copy link
Contributor Author

Hi @gkurz!
Thank you for the review. I've updated the PR according to your comments.

@balintTobik balintTobik requested a review from gkurz December 9, 2024 11:08
@openshift-ci-robot
Copy link

openshift-ci-robot commented Dec 9, 2024

@balintTobik: This pull request references KATA-3270 which is a valid jira issue.

In response to this:

- Description of the problem which is fixed/What is the use case
Peer pods' overhead reduce actual worker capacity, however most of the RAM/CPU allocated in cloud instance.

Fixes: KATA-3270

- What I did

Operator create 'kata-remote' runtime class with a specific and lower overhead, like in upstream.

- How to verify it

  1. create KataConfig for a peer pods
  2. check 'kata-remote' runtime class created a lower overhead than 'kata' runtime class

- Description for the changelog
Use separate pod's overhead related constants for kata and peer pods.

[EDIT from Greg: adapted description to final version]

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@gkurz gkurz added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Dec 9, 2024
@openshift-ci openshift-ci bot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Dec 9, 2024
Copy link
Member

@gkurz gkurz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @balintTobik ! Thanks for this work.

I've pushed another commit on top of your PR that does some trivial polishing and suggests a description for your commit. I will squash at merge time.

Now we'll ask for some feedback from @vvoronko.

@gkurz gkurz requested a review from vvoronko December 9, 2024 16:26
@gkurz gkurz force-pushed the remove_peeroverhead branch 2 times, most recently from 3b321b5 to 448ee53 Compare December 10, 2024 14:59
Rever memory overhead of `kata` to 350 megs.

Commit text example :

The pod overhead of peer pods is expected to be lower than
regular kata pods. Use a different set of values for the
`kata` and `kata-remote` runtime classe, like upstream kata
does.

Current numbers for `kata` are kept. New numbers for `kata-remote`
are taken from upstream.The numbers can probably be refined some
more through experiments.

Signed-off-by: Greg Kurz <[email protected]>
@gkurz gkurz force-pushed the remove_peeroverhead branch from 448ee53 to ce52f59 Compare December 10, 2024 15:11
Copy link

openshift-ci bot commented Dec 10, 2024

@balintTobik: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Contributor

@littlejawa littlejawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
Thanks!

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 11, 2024
@gkurz gkurz merged commit 22a5690 into openshift:devel Dec 11, 2024
4 checks passed
@balintTobik balintTobik deleted the remove_peeroverhead branch December 12, 2024 08:37
@gkurz
Copy link
Member

gkurz commented Dec 12, 2024

I should have squashed the patches at merge time but I forgot... never mind.

gkurz added a commit to gkurz/sandboxed-containers-operator that referenced this pull request Dec 12, 2024
…overhead"

This reverts commit 22a5690, reversing
changes made to 4325a09.

Balint's patch will be re-applied with a commit log that I forgot
to squash when I merged the PR.

Signed-off-by: Greg Kurz <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants