Skip to content

criu page-xfer failure on AlmaLinux 8 #4729

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
kolyshkin opened this issue Apr 16, 2025 · 5 comments
Open

criu page-xfer failure on AlmaLinux 8 #4729

kolyshkin opened this issue Apr 16, 2025 · 5 comments

Comments

@kolyshkin
Copy link
Contributor

From https://cirrus-ci.com/task/5689701597708288?logs=unit_tests#L136

=== RUN   TestUsernsCheckpoint
time="2025-04-16T21:13:52Z" level=warning msg="--- Quoting \"/tmp/TestUsernsCheckpoint3117440493/003/criu/dump.log\""
time="2025-04-16T21:13:52Z" level=warning msg="849:(00.152996) page-xfer: Transferring pages:"
time="2025-04-16T21:13:52Z" level=warning msg="850:(00.152998) page-xfer: \tbuf 1/1"
time="2025-04-16T21:13:52Z" level=warning msg="851:(00.152999) page-xfer: \tp 0x7ffd4bea0000 [1]"
time="2025-04-16T21:13:52Z" level=warning msg="852:(00.153005) page-xfer: \th 0x7ffd4bea1000 [1]"
time="2025-04-16T21:13:52Z" level=warning msg="853:(00.153007) page-xfer: Checking 0x7ffd4bea1000/4096 hole"
time="2025-04-16T21:13:52Z" level=warning msg="854:(00.153010) Error (criu/page-xfer.c:299): page-xfer: Missing 7ffd4bea1000 in parent pagemap"
time="2025-04-16T21:13:52Z" level=warning msg="855:(00.153014) Error (criu/page-xfer.c:342): page-xfer: Hole 0x7ffd4bea1000/4096 not found in parent"
time="2025-04-16T21:13:52Z" level=warning msg="856:(00.153037) page-pipe: Killing page pipe"
time="2025-04-16T21:13:52Z" level=warning msg="857:(00.153065) ----------------------------------------"
time="2025-04-16T21:13:52Z" level=warning msg="858:(00.153067) Error (criu/mem.c:672): Can't dump page with parasite"
time="2025-04-16T21:13:52Z" level=warning msg=...
time="2025-04-16T21:13:52Z" level=warning msg="868:(00.153308) net: Unlock network"
time="2025-04-16T21:13:52Z" level=warning msg="869:(00.153312) Running network-unlock scripts"
time="2025-04-16T21:13:52Z" level=warning msg="870:(00.153314) \tRPC"
time="2025-04-16T21:13:52Z" level=warning msg="871:(00.177045) Unfreezing tasks into 1"
time="2025-04-16T21:13:52Z" level=warning msg="872:(00.177074) \tUnseizing 46190 into 1"
time="2025-04-16T21:13:52Z" level=warning msg="873:(00.177098) Error (criu/cr-dump.c:2111): Dumping FAILED."
time="2025-04-16T21:13:52Z" level=warning msg=---
    checkpoint_test.go:113: criu failed: type DUMP errno 0
--- FAIL: TestUsernsCheckpoint (0.61s)
@kolyshkin
Copy link
Contributor Author

Agains (see https://cirrus-ci.com/task/5080791870341120?logs=unit_tests#L357):

=== RUN   TestUsernsCheckpoint
time="2025-04-17T19:01:26Z" level=warning msg="--- Quoting \"/tmp/TestUsernsCheckpoint3660773276/003/criu/dump.log\""
time="2025-04-17T19:01:26Z" level=warning msg="839:(00.166824) page-xfer: Transferring pages:"
time="2025-04-17T19:01:26Z" level=warning msg="840:(00.166826) page-xfer: \tbuf 1/1"
time="2025-04-17T19:01:26Z" level=warning msg="841:(00.166828) page-xfer: \tp 0x7ffd1d19b000 [1]"
time="2025-04-17T19:01:26Z" level=warning msg="842:(00.166847) page-xfer: \th 0x7ffd1d19c000 [1]"
time="2025-04-17T19:01:26Z" level=warning msg="843:(00.166849) page-xfer: Checking 0x7ffd1d19c000/4096 hole"
time="2025-04-17T19:01:26Z" level=warning msg="844:(00.166852) Error (criu/page-xfer.c:299): page-xfer: Missing 7ffd1d19c000 in parent pagemap"
time="2025-04-17T19:01:26Z" level=warning msg="845:(00.166855) Error (criu/page-xfer.c:342): page-xfer: Hole 0x7ffd1d19c000/4096 not found in parent"
time="2025-04-17T19:01:26Z" level=warning msg="846:(00.166876) page-pipe: Killing page pipe"
time="2025-04-17T19:01:26Z" level=warning msg="847:(00.166906) ----------------------------------------"
time="2025-04-17T19:01:26Z" level=warning msg="848:(00.166908) Error (criu/mem.c:672): Can't dump page with parasite"
time="2025-04-17T19:01:26Z" level=warning msg=...
time="2025-04-17T19:01:26Z" level=warning msg="858:(00.167217) net: Unlock network"
time="2025-04-17T19:01:26Z" level=warning msg="859:(00.167221) Running network-unlock scripts"
time="2025-04-17T19:01:26Z" level=warning msg="860:(00.167223) \tRPC"
time="2025-04-17T19:01:26Z" level=warning msg="861:(00.182697) Unfreezing tasks into 1"
time="2025-04-17T19:01:26Z" level=warning msg="862:(00.182710) \tUnseizing 47334 into 1"
time="2025-04-17T19:01:26Z" level=warning msg="863:(00.182735) Error (criu/cr-dump.c:2111): Dumping FAILED."
time="2025-04-17T19:01:26Z" level=warning msg=---
    checkpoint_test.go:113: criu failed: type DUMP errno 0
--- FAIL: TestUsernsCheckpoint (0.67s)

@kolyshkin
Copy link
Contributor Author

I was wrong earlier. Since checkpoint-restore/criu#2642 is closed (2 days ago), we use the same criu version (4.1) on Ubuntu 24.04 (amd64) and Ubuntu 22.04 (arm64), and it has not failed in there (yet?).

We also run CI on Fedora 41 (which has criu 4.0) and AlmaLinux 9 (criu 3.19), those are not failing. If repositories are provided, I can try running criu 4.1 on these two (AL9 and F41).

@kolyshkin
Copy link
Contributor Author

criu 4.1 on F41 is not failing like this, instead it's failing in another way (checkpoint-restore/criu#2650). Meaning, this issue is probably specific to older kernels.

@kolyshkin
Copy link
Contributor Author

From https://cirrus-ci.com/task/6677068005507072

=== RUN TestUsernsCheckpoint
time="2025-04-29T20:00:42Z" level=warning msg="--- Quoting "/tmp/TestUsernsCheckpoint2444957036/003/criu/dump.log""
time="2025-04-29T20:00:42Z" level=warning msg="841:(00.175309) page-xfer: Transferring pages:"
time="2025-04-29T20:00:42Z" level=warning msg="842:(00.175311) page-xfer: \tbuf 1/1"
time="2025-04-29T20:00:42Z" level=warning msg="843:(00.175312) page-xfer: \tp 0x7ffe0ec9f000 [1]"
time="2025-04-29T20:00:42Z" level=warning msg="844:(00.175319) page-xfer: \th 0x7ffe0eca0000 [1]"
time="2025-04-29T20:00:42Z" level=warning msg="845:(00.175321) page-xfer: Checking 0x7ffe0eca0000/4096 hole"
time="2025-04-29T20:00:42Z" level=warning msg="846:(00.175323) Error (criu/page-xfer.c:299): page-xfer: Missing 7ffe0eca0000 in parent pagemap"
time="2025-04-29T20:00:42Z" level=warning msg="847:(00.175327) Error (criu/page-xfer.c:342): page-xfer: Hole 0x7ffe0eca0000/4096 not found in parent"
time="2025-04-29T20:00:42Z" level=warning msg="848:(00.175349) page-pipe: Killing page pipe"
time="2025-04-29T20:00:42Z" level=warning msg="849:(00.175384) ----------------------------------------"
time="2025-04-29T20:00:42Z" level=warning msg="850:(00.175387) Error (criu/mem.c:672): Can't dump page with parasite"
time="2025-04-29T20:00:42Z" level=warning msg=...
time="2025-04-29T20:00:42Z" level=warning msg="860:(00.175661) net: Unlock network"
time="2025-04-29T20:00:42Z" level=warning msg="861:(00.175665) Running network-unlock scripts"
time="2025-04-29T20:00:42Z" level=warning msg="862:(00.175667) \tRPC"
time="2025-04-29T20:00:42Z" level=warning msg="863:(00.202400) Unfreezing tasks into 1"
time="2025-04-29T20:00:42Z" level=warning msg="864:(00.202417) \tUnseizing 43608 into 1"
time="2025-04-29T20:00:42Z" level=warning msg="865:(00.202437) Error (criu/cr-dump.c:2111): Dumping FAILED."
time="2025-04-29T20:00:42Z" level=warning msg=---
checkpoint_test.go:113: criu failed: type DUMP errno 0
--- FAIL: TestUsernsCheckpoint (0.67s)

@kolyshkin
Copy link
Contributor Author

Upsteam bug report (which is not about criu 4.1, as I filed in in Dec 2024): checkpoint-restore/criu#2551

@kolyshkin kolyshkin changed the title criu 4.1 fail on AlmaLinux 8 criu page-xfer failure on AlmaLinux 8 Apr 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant