-
Notifications
You must be signed in to change notification settings - Fork 641
Possible criu v4.1 regression in Fedora #2650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
It appears to be a consistent failure, can also easy reproduce locally. Here are the repro steps on a Fedora 41 system:
Note it does not fail this way on Ubuntu 24.04. |
It can be a pagemap_scan issue. Could you compile CRIU without pagemap_scan and try it out? You can change the code here: another way is to add |
git-bisect points to commit 867c773. Indeed, if I compile criu v4.1 without adding |
|
@adrianreber PTAL |
Same in openSUSE Tumbleweed. |
This is indeed new functionality. Not sure if we should have switched in the middle of Fedora 41 from iptables based locking to nftables based locking (@rst0git) but this would have delayed this report just until Fedora 42 is used. I am able to see it locally and I think I understand what is going on. CRIU is locking the network with the same ID it was locked during checkpointing. I can see that the network locking is still active by inserting following line before the restore: The idea of the change was we create a uuid, lock the network with a nft table using that uuid during checkpointing. During restore we check if the checkpoint image has a uuid and use that uuid again for unlocking. It seems I missed the possibility that the network is also locked during restore. I thought it is only unlocked during restore. Without looking at the code it is not yet clear why CRIU is locking the network during restore. As mentioned, I was only expecting unlocking. So either we need to use another name for the network locking during restore or re-use the nft table instead of creating a new one. @avagin any recommendations from your side? Do you know if we can re-use the existing table to lock during restore or should we create a new table for network locking during restore? I will look at the code in a couple of days, but it should be fixable. A quick workaround could be a configuration file with |
CRIU attempts to lock the network during restore in an "empty" network namespace. However, "empty" in this context means CRIU isn't restoring the namespace. This network namespace can be the same namespace where processes have been dumped and so the network is alwady locked in it. Fixes checkpoint-restore#2650 Signed-off-by: Andrei Vagin <[email protected]>
CRIU locks the network during restore in an "empty" network namespace. However, "empty" in this context means CRIU isn't restoring the namespace. This network namespace can be the same namespace where processes have been dumped and so the network is already locked in it. Fixes checkpoint-restore#2650 Signed-off-by: Andrei Vagin <[email protected]>
CRIU locks the network during restore in an "empty" network namespace. However, "empty" in this context means CRIU isn't restoring the namespace. This network namespace can be the same namespace where processes have been dumped and so the network is already locked in it. Fixes checkpoint-restore#2650 Signed-off-by: Andrei Vagin <[email protected]>
@avagin opened a PR with a possible fix. Once it is merged we can update the Fedora packages with it. @ricardobranco777 can you bring the patch to the openSUSE packages once it is merged? |
Sure. Thanks! |
Tumbleweed is a rolling release and tries not to ship downstream patches if possible, but can pick up new versions. Will you make a new release? |
I wouldn't expect a new release for this small change. For Fedora I have no problem just applying a patch. In the past CRIU didn't release a new version for minor changes like this. It would not be really a downstream only patch as it is in the upstream repository. |
@kolyshkin Updated Fedora packages are heading towards the testing repository https://bodhi.fedoraproject.org/updates/FEDORA-2025-d374d8ce17 |
Thank you! In the meantime criu-4.1 got promoted to updates and so runc CI is busted again 😢 |
This version has a known bug [1] which is going to be fixed in the upcoming criu release [2]. So, let's skip criu testing on Fedora until a newer criu rpm is available. [1]: checkpoint-restore/criu#2650 [2]: https://bodhi.fedoraproject.org/updates/FEDORA-2025-d374d8ce17 Signed-off-by: Kir Kolyshkin <[email protected]>
This version has a known bug [1] which is going to be fixed in the upcoming criu release [2]. So, let's skip criu testing on Fedora until a newer criu rpm is available. [1]: checkpoint-restore/criu#2650 [2]: https://bodhi.fedoraproject.org/updates/FEDORA-2025-d374d8ce17 Signed-off-by: Kir Kolyshkin <[email protected]>
This version has a known bug [1] which is going to be fixed in the upcoming criu release [2]. So, let's skip criu testing on Fedora until a newer criu rpm is available. [1]: checkpoint-restore/criu#2650 [2]: https://bodhi.fedoraproject.org/updates/FEDORA-2025-d374d8ce17 Signed-off-by: Kir Kolyshkin <[email protected]>
Package criu-4.1-1 has a known bug [1] which is fixed in criu-4.1-2 [2], which is currently only available in updates-testing. Add a kludge to install newer criu if necessary to fix CI. This will not be needed in ~2 weeks once the new package is promoted to updates. [1]: checkpoint-restore/criu#2650 [2]: https://bodhi.fedoraproject.org/updates/FEDORA-2025-d374d8ce17 Signed-off-by: Kir Kolyshkin <[email protected]>
Package criu-4.1-1 has a known bug [1] which is fixed in criu-4.1-2 [2], which is currently only available in updates-testing. Add a kludge to install newer criu if necessary to fix CI. This will not be needed in ~2 weeks once the new package is promoted to updates. [1]: checkpoint-restore/criu#2650 [2]: https://bodhi.fedoraproject.org/updates/FEDORA-2025-d374d8ce17 Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit 281e7dc) Signed-off-by: Kir Kolyshkin <[email protected]>
Package criu-4.1-1 has a known bug [1] which is fixed in criu-4.1-2 [2], which is currently only available in updates-testing. Add a kludge to install newer criu if necessary to fix CI. This will not be needed in ~2 weeks once the new package is promoted to updates. [1]: checkpoint-restore/criu#2650 [2]: https://bodhi.fedoraproject.org/updates/FEDORA-2025-d374d8ce17 Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit 281e7dc) Signed-off-by: Kir Kolyshkin <[email protected]>
Package criu-4.1-1 has a known bug [1] which is fixed in criu-4.1-2 [2], which is currently only available in updates-testing. Add a kludge to install newer criu if necessary to fix CI. This will not be needed in ~2 weeks once the new package is promoted to updates. [1]: checkpoint-restore/criu#2650 [2]: https://bodhi.fedoraproject.org/updates/FEDORA-2025-d374d8ce17 Signed-off-by: Kir Kolyshkin <[email protected]>
Package criu-4.1-1 has a known bug [1] which is fixed in criu-4.1-2 [2], which is currently only available in updates-testing. Add a kludge to install newer criu if necessary to fix CI. This will not be needed in ~2 weeks once the new package is promoted to updates. [1]: checkpoint-restore/criu#2650 [2]: https://bodhi.fedoraproject.org/updates/FEDORA-2025-d374d8ce17 Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit 3e3e048) Signed-off-by: Kir Kolyshkin <[email protected]>
Package criu-4.1-1 has a known bug [1] which is fixed in criu-4.1-2 [2], which is currently only available in updates-testing. Add a kludge to install newer criu if necessary to fix CI. This will not be needed in ~2 weeks once the new package is promoted to updates. [1]: checkpoint-restore/criu#2650 [2]: https://bodhi.fedoraproject.org/updates/FEDORA-2025-d374d8ce17 Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit 3e3e048) Signed-off-by: Kir Kolyshkin <[email protected]>
Tracking upstream https://bugzilla.suse.com/show_bug.cgi?id=1241515 |
Yes, this is fixed and you might want to add a patch from #2653 into your build. Similar fix in Fedora: https://src.fedoraproject.org/rpms/criu/c/323d01daa05d3d402d05114c21904b645ad755ba |
When criu v4.1 is used in runc CI tests in Fedora 41, it fails like this:
The test case source is here: https://github.com/opencontainers/runc/blob/e55fe63aed22520d565d1a3490e1655e839068eb/tests/integration/checkpoint.bats#L271
This is the first time I am seeing this, so I presume it's a regression in criu v4.1. Might be a hash collision but I very much doubt it.
The text was updated successfully, but these errors were encountered: