Skip to content

deps: bump opencontainers/cgroups to v0.0.2, fix tests #4751

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

kolyshkin
Copy link
Contributor

@kolyshkin kolyshkin commented Apr 28, 2025

For opencontainers/cgroups changes, see
https://github.com/opencontainers/cgroups/releases/tag/v0.0.2

Fix integration tests according to changes in opencontainers/cgroups#4
(now the CPU quota value set is rounded the same way systemd does it).

Related to: #4639
Fixes: #4622

@kolyshkin kolyshkin changed the title deps: bump opencontainers/cgroups to v0.0.2 deps: bump opencontainers/cgroups to v0.0.2, fix tests Apr 29, 2025
@kolyshkin kolyshkin force-pushed the cgroups-002 branch 2 times, most recently from 58b00e9 to d756289 Compare April 29, 2025 20:27
@kolyshkin kolyshkin marked this pull request as ready for review April 29, 2025 20:27
Copy link
Member

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable to me, but based on past experience, I think it would be great to run this in moby, containerd and kubernetes CI to make sure no one expects the old thing in tests.

I guess the Kubernetes CI can be the trickiest to run. I hope what is run on an open PR is enough to cover any regressions. But I can reach out to k8s people to ask, if you run into issues :)

@kolyshkin
Copy link
Contributor Author

I think it would be great to run this in moby, containerd and kubernetes CI to make sure no one expects the old thing in tests.

Isn't it why we make pre-releases for?

Apparently k8s did something about it: kubernetes/kubernetes#131059.

I would also add cri-o and podman to the list of software whose tests are potentially broken by this update.

In any case, what's done is the right thing to do, because

  • practically, the rounding doesn't matter;
  • over time systemd may overwrite the value we wrote using fs driver;
  • there is no way to work around the systemd overwriting.

So,

  • if some tests will be broken, they need to be fixed in one way or another;
  • I doubt any real applications will fail, it's mostly about the tests.

@rata
Copy link
Member

rata commented May 5, 2025

In the same way we run tests on every commit and not just before releases (something that was not so uncommon decades ago), I think it makes sense to do this now. It might not need anything, or it might need fixes and it might be easy to do. But it might also point out a case we haven't really considered and we might want this to behave different in that very specific case.

Also, if changes are needed, we might be able to fix the issues now (here or in known downstream repos), so when runc is released no adjustments are needed and users can just upgrade without delay.

Fixing it now is just simpler if it's hard, and not so much extra work if there isn't anything to fix.

@kolyshkin
Copy link
Contributor Author

In the same way we run tests on every commit and not just before releases (something that was not so uncommon decades ago), I think it makes sense to do this now.

I'm sorry, I'm afraid I don't understand.

First, we don't run the tests on every commit (unfortunately it's somewhat hard to implement it on github/gha; I tried implementing it a few years ago and eventually gave up; yet I know that such setups exist, not necessarily in GHA), we only run tests on every PR.

Second, this is a PR, and therefore we do run tests on it like on every other runc PR.

Or are you saying we need to test every runc PR against kubernetes, containerd, moby, podman, cri-o etc? If that's the case, I totally agree it's a good idea in general. Yet, this is out of out of scope for this particular PR.

It might not need anything, or it might need fixes and it might be easy to do. But it might also point out a case we haven't really considered and we might want this to behave different in that very specific case.

Also, if changes are needed, we might be able to fix the issues now (here or in known downstream repos), so when runc is released no adjustments are needed and users can just upgrade without delay.

Fixing it now is just simpler if it's hard, and not so much extra work if there isn't anything to fix.

Again, if that is about testing every runc PR (and/or runc@HEAD) against its main consumers (kubernetes) etc, I do think it's a good idea, but we should discuss it separately and it should not prevent this PR from moving forward. I suggest you to open an issue for that.

@rata
Copy link
Member

rata commented May 7, 2025

I agree running every PR against those consumers is out of scope for this PR, 100% agree.

I just mean: finding bugs earlier is better, as it is easier to debug and fix. As we don't run CI against all those consumers, I think it make sense to do it manually when we see a "higher risk" of a PR affecting those consumers.

I think for this PR it can be particularly useful to test it against those consumers. I'd prefer if we can know ASAP if this breaks any of them (and if we want to change the feature slightly or just change their tests).

If you don't want to do it until we have an rc release, I don't agree it's a good idea, but it's okay too. I won't push more on this :)

@kolyshkin
Copy link
Contributor Author

I agree running every PR against those consumers is out of scope for this PR, 100% agree.

I just mean: finding bugs earlier is better, as it is easier to debug and fix. As we don't run CI against all those consumers, I think it make sense to do it manually when we see a "higher risk" of a PR affecting those consumers.

I think for this PR it can be particularly useful to test it against those consumers. I'd prefer if we can know ASAP if this breaks any of them (and if we want to change the feature slightly or just change their tests).

If you don't want to do it until we have an rc release, I don't agree it's a good idea, but it's okay too. I won't push more on this :)

Thanks for the clarification.

The only problem here is my capacity is limited, so I could really use some help here. From kubernetes it might be @dims, from cri-o it might me @haircommander, and I hope @thaJeztah could help on the moby and/or containerd side. I will try my luck with podman. Perhaps @hshiina may help here, too.

@dims
Copy link
Contributor

dims commented May 8, 2025

@kolyshkin lot of jobs in k8s CI inherit the version of runc that is packaged with containerd, so we probably should start there (run some subset of containerd jobs in a periodic fashion with runc master or branch we wish). yes, we should visit some of the k8s ci jobs as well once we have that.

kolyshkin added 2 commits May 13, 2025 13:28
Instead of providing systemd CPU quota value (CPUQuotaPerSec),
calculate it based on how opencontainers/cgroups/systemd handles
it (see addCPUQuota).

Signed-off-by: Kir Kolyshkin <[email protected]>
For changes, see https://github.com/opencontainers/cgroups/releases/tag/v0.0.2

Fix integration tests according to changes in [1] (now the CPU quota value set
is rounded the same way systemd does it).

[1]: opencontainers/cgroups#4
Signed-off-by: Kir Kolyshkin <[email protected]>
@kolyshkin
Copy link
Contributor Author

To reiterate, the gist of this change is, when a systemd cgroup manager is used, the CPU quota value is now rounded up to the nearest 10ms.

This change is a limitation of systemd, which we can't (and don't want to) change (rounding is perfectly fine). This change is also unavoidable, since we do set systemd value as well, and system can and will write the (rounded) value

It shoud not break any real workloads as the quota remains about the same, but it may break some CI if they expect and check for exact quota values to be set (similar to the way it is fixed in this PR).

@opencontainers/runc-maintainers WDYT? PTAL

@lifubang
Copy link
Member

lifubang commented May 15, 2025

I think for this specific change, it’s ok to merge from runc side.
My opinion is that, if Runc maintainers consider it’s worth to merge, we can merge it in runc. If there are some break changes in cgroups repo for other related projects, for example containerd, docker, or Kubernetes, they will file issues to cgroups repo. We can bump and merge again in runc side. I think it’s not a problem in runc, but in cgroups side. This was also a problem in runc before we split cgroups to a separate repo. One of the purpose of the separation is to speed up the code development. So let's each do our own job, after all, we can't have both fish and bear's paw at one moment.
Maybe in cgroups repo, we should find a way to generate an evaluation report for the bump up of the latest cgroups release in popular dependency projects.

@kolyshkin
Copy link
Contributor Author

cri-o seems to be working fine, thanks to @chuanchangjia, see cri-o/cri-o#9221

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

systemd driver updates CPU quota inconsitently
4 participants