Skip to content

Investigate podman checkpoint test failures on centos-stream-10-x86_64 #1759

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
lsm5 opened this issue May 22, 2025 · 7 comments
Open

Investigate podman checkpoint test failures on centos-stream-10-x86_64 #1759

lsm5 opened this issue May 22, 2025 · 7 comments

Comments

@lsm5
Copy link
Member

lsm5 commented May 22, 2025

podman checkpoint tests fail on centos-stream-10-x86_64 env like so:

not ok 1 [520] podman checkpoint - basic test
# tags: ci:parallel
# (from function `bail-now' in file /usr/share/podman/test/system/helpers.bash, line 187,
#  from function `die' in file /usr/share/podman/test/system/helpers.bash, line 970,
#  from function `run_podman' in file /usr/share/podman/test/system/helpers.bash, line 572,
#  in test file /usr/share/podman/test/system/520-checkpoint.bats, line 45)
#   `run_podman container checkpoint $cid' failed
#
# [12:32:37.289378046] # podman run -d quay.io/libpod/testimage:20241011 sh -c while :;do cat /proc/uptime; sleep 0.1;done
# [12:32:37.460386134] aa754504d33332cccc1798ff687d20164cce1219f9379886a75d4a7855dad895
#
# [12:32:37.476266893] # podman logs aa754504d33332cccc1798ff687d20164cce1219f9379886a75d4a7855dad895
# [12:32:37.510535848] 397.05 455.99
#
# [12:32:37.518142895] # podman container checkpoint aa754504d33332cccc1798ff687d20164cce1219f9379886a75d4a7855dad895
# [12:32:37.602616185] CRIU checkpointing failed -52.  Please check CRIU logfile /var/lib/containers/storage/overlay-containers/aa754504d33332cccc1798ff687d20164cce1219f9379886a75d4a7855dad895/userdata/dump.log: Invalid exchange
# Error: `/usr/bin/crun checkpoint --image-path /var/lib/containers/storage/overlay-containers/aa754504d33332cccc1798ff687d20164cce1219f9379886a75d4a7855dad895/userdata/checkpoint --work-path /var/lib/containers/storage/overlay-containers/aa754504d33332cccc1798ff687d20164cce1219f9379886a75d4a7855dad895/userdata aa754504d33332cccc1798ff687d20164cce1219f9379886a75d4a7855dad895` failed: exit status 1
# [12:32:37.605653965] [ rc=125 (** EXPECTED 0 **) ]
# #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
# #| FAIL: exit code is 125; expected 0
# #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# # [teardown]
not ok 2 [520] podman checkpoint/restore print IDs or raw input
# (from function `bail-now' in file /usr/share/podman/test/system/helpers.bash, line 187,
#  from function `die' in file /usr/share/podman/test/system/helpers.bash, line 970,
#  from function `run_podman' in file /usr/share/podman/test/system/helpers.bash, line 572,
#  in test file /usr/share/podman/test/system/520-checkpoint.bats, line 117)
#   `run_podman container checkpoint -a' failed
#
# [12:32:38.107536812] # podman run -d quay.io/libpod/testimage:20241011 top
# [12:32:38.283187758] f380fdc16b2c0ba8aa1f3953a3b293f30337b1bde3cdbad85417bdd821331c2a
#
# [12:32:38.288706676] # podman container checkpoint -a
# [12:32:38.353895738] CRIU checkpointing failed -52.  Please check CRIU logfile /var/lib/containers/storage/overlay-containers/f380fdc16b2c0ba8aa1f3953a3b293f30337b1bde3cdbad85417bdd821331c2a/userdata/dump.log: Invalid exchange
# Error: `/usr/bin/crun checkpoint --image-path /var/lib/containers/storage/overlay-containers/f380fdc16b2c0ba8aa1f3953a3b293f30337b1bde3cdbad85417bdd821331c2a/userdata/checkpoint --work-path /var/lib/containers/storage/overlay-containers/f380fdc16b2c0ba8aa1f3953a3b293f30337b1bde3cdbad85417bdd821331c2a/userdata f380fdc16b2c0ba8aa1f3953a3b293f30337b1bde3cdbad85417bdd821331c2a` failed: exit status 1
# [12:32:38.356880607] [ rc=125 (** EXPECTED 0 **) ]
# #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
# #| FAIL: exit code is 125; expected 0
# #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# # [teardown]
not ok 3 [520] podman checkpoint --export, with volumes
# tags: ci:parallel
# (from function `bail-now' in file /usr/share/podman/test/system/helpers.bash, line 187,
#  from function `die' in file /usr/share/podman/test/system/helpers.bash, line 970,
#  from function `run_podman' in file /usr/share/podman/test/system/helpers.bash, line 572,
#  in test file /usr/share/podman/test/system/520-checkpoint.bats, line 163)
#   `run_podman $p_opts container checkpoint \' failed
#
# [12:32:38.859512031] # podman save -o /tmp/podman_bats.QusMCm/image.tar quay.io/libpod/testimage:20241011
# [12:32:38.981361509] Copying blob sha256:b66a884aaf08f1c410c61682a7072d68a0d837ba8dc16ff13b9509bdceb32fd2
# Copying blob sha256:5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef
# Copying config sha256:b82e560ed57b77a897379e160371adcf1b000ca885e69c62cbec674777a83850
# Writing manifest to image destination
#
# [12:32:38.986872767] # podman --root /tmp/podman_bats.QusMCm/root --runroot /tmp/podman_bats.QusMCm/runroot --tmpdir /tmp/podman_bats.QusMCm/tmpdir --events-backend file load -i /tmp/podman_bats.QusMCm/image.tar
# [12:32:39.273103249] Getting image source signatures
# Copying blob sha256:5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef
# Copying blob sha256:b66a884aaf08f1c410c61682a7072d68a0d837ba8dc16ff13b9509bdceb32fd2
# Copying config sha256:b82e560ed57b77a897379e160371adcf1b000ca885e69c62cbec674777a83850
# Writing manifest to image destination
# Loaded image: quay.io/libpod/testimage:20241011
#
# [12:32:39.287905702] # podman --root /tmp/podman_bats.QusMCm/root --runroot /tmp/podman_bats.QusMCm/runroot --tmpdir /tmp/podman_bats.QusMCm/tmpdir --events-backend file volume create v-t3-rvymbgki
# [12:32:39.318728587] v-t3-rvymbgki
#
# [12:32:39.356315820] # podman --root /tmp/podman_bats.QusMCm/root --runroot /tmp/podman_bats.QusMCm/runroot --tmpdir /tmp/podman_bats.QusMCm/tmpdir --events-backend file run -d --name c-t3-rvymbgki --volume v-t3-rvymbgki:/myvol -p 5521:80 -w /myvol quay.io/libpod/testimage:20241011 sh -c /bin/busybox-extras httpd -p 80;echo c-t3-rvymbgki >cname;echo READY;while :;do cat /proc/uptime >mydate.tmp;mv -f mydate.tmp mydate;sleep 0.1;done
# [12:32:39.515414272] b1348e79e724bf35495ecaaefdc6e18964edd1bb4de9700d14c24ac859d47a31
#
# [12:32:39.532909998] # podman logs b1348e79e724bf35495ecaaefdc6e18964edd1bb4de9700d14c24ac859d47a31
# [12:32:39.567719236] READY
#
# [12:32:39.593373471] # podman --root /tmp/podman_bats.QusMCm/root --runroot /tmp/podman_bats.QusMCm/runroot --tmpdir /tmp/podman_bats.QusMCm/tmpdir --events-backend file container checkpoint --ignore-rootfs --export=/tmp/podman_bats.QusMCm/c-t3-rvymbgki.tar.gz c-t3-rvymbgki
# [12:32:39.661089902] CRIU checkpointing failed -52.  Please check CRIU logfile /tmp/podman_bats.QusMCm/root/overlay-containers/b1348e79e724bf35495ecaaefdc6e18964edd1bb4de9700d14c24ac859d47a31/userdata/dump.log: Invalid exchange
# Error: `/usr/bin/crun checkpoint --image-path /tmp/podman_bats.QusMCm/root/overlay-containers/b1348e79e724bf35495ecaaefdc6e18964edd1bb4de9700d14c24ac859d47a31/userdata/checkpoint --work-path /tmp/podman_bats.QusMCm/root/overlay-containers/b1348e79e724bf35495ecaaefdc6e18964edd1bb4de9700d14c24ac859d47a31/userdata b1348e79e724bf35495ecaaefdc6e18964edd1bb4de9700d14c24ac859d47a31` failed: exit status 1
# [12:32:39.664092862] [ rc=125 (** EXPECTED 0 **) ]
# #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
# #| FAIL: exit code is 125; expected 0
# #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# # [teardown]
# rm: cannot remove '/tmp/podman_bats.QusMCm/root/overlay/e293d086818e740c8691f406e6eaf3deb0ba7fb86f2413088f0f2543280fe778/merged': Device or resource busy
# rm: cannot remove '/tmp/podman_bats.QusMCm/root/overlay-containers/b1348e79e724bf35495ecaaefdc6e18964edd1bb4de9700d14c24ac859d47a31/userdata/shm': Device or resource busy
not ok 4 [520] podman checkpoint --file-locks
# tags: ci:parallel
# (from function `bail-now' in file /usr/share/podman/test/system/helpers.bash, line 187,
#  from function `die' in file /usr/share/podman/test/system/helpers.bash, line 970,
#  from function `run_podman' in file /usr/share/podman/test/system/helpers.bash, line 572,
#  in test file /usr/share/podman/test/system/520-checkpoint.bats, line 208)
#   `run_podman container checkpoint --file-locks $cid' failed
#
# [12:32:40.063568252] # podman run -d quay.io/libpod/testimage:20241011 sh -c touch /wait; touch test.lock; echo READY; flock test.lock sh -c "while [ -e /wait ];do sleep 0.5;done;for i in 1 2 3;do echo \$i;sleep 0.5;done" & flock test.lock sh -c "while [ -e /wait ];do sleep 0.5;done;for i in 1 2 3;do echo \$i;sleep 0.5;done" & wait
# [12:32:40.207385819] 67881fd268c64a4c616ac0f84acb109642640c4e6ecfd1f0e1525b97479921d7
#
# [12:32:40.224641300] # podman logs 67881fd268c64a4c616ac0f84acb109642640c4e6ecfd1f0e1525b97479921d7
# [12:32:40.260058042] READY
#
# [12:32:40.267968915] # podman container checkpoint --file-locks 67881fd268c64a4c616ac0f84acb109642640c4e6ecfd1f0e1525b97479921d7
# [12:32:40.338913292] CRIU checkpointing failed -52.  Please check CRIU logfile /var/lib/containers/storage/overlay-containers/67881fd268c64a4c616ac0f84acb109642640c4e6ecfd1f0e1525b97479921d7/userdata/dump.log: Invalid exchange
# Error: `/usr/bin/crun checkpoint --image-path /var/lib/containers/storage/overlay-containers/67881fd268c64a4c616ac0f84acb109642640c4e6ecfd1f0e1525b97479921d7/userdata/checkpoint --work-path /var/lib/containers/storage/overlay-containers/67881fd268c64a4c616ac0f84acb109642640c4e6ecfd1f0e1525b97479921d7/userdata --file-locks 67881fd268c64a4c616ac0f84acb109642640c4e6ecfd1f0e1525b97479921d7` failed: exit status 1
# [12:32:40.342021564] [ rc=125 (** EXPECTED 0 **) ]
# #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
# #| FAIL: exit code is 125; expected 0
# #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# # [teardown]
not ok 5 [520] podman checkpoint/restore ip and mac handling
# tags: ci:parallel
# (from function `bail-now' in file /usr/share/podman/test/system/helpers.bash, line 187,
#  from function `die' in file /usr/share/podman/test/system/helpers.bash, line 970,
#  from function `run_podman' in file /usr/share/podman/test/system/helpers.bash, line 572,
#  in test file /usr/share/podman/test/system/520-checkpoint.bats, line 253)
#   `run_podman container checkpoint $cid' failed
#
# [12:32:40.817375344] # podman network create --subnet 172.11.73.0/24 net-t5-euyssq6n
# [12:32:40.846512560] net-t5-euyssq6n
#
# [12:32:40.851935136] # podman run -d --network net-t5-euyssq6n quay.io/libpod/testimage:20241011 top
# [12:32:41.053557635] ec90e9097fc6a98a560971fce4f97e75cc6e90afeeb9c4793c5a13ad304d9c42
#
# [12:32:41.059352370] # podman inspect ec90e9097fc6a98a560971fce4f97e75cc6e90afeeb9c4793c5a13ad304d9c42 --format {{(index .NetworkSettings.Networks "net-t5-euyssq6n").IPAddress}}
# [12:32:41.094562587] 172.11.73.2
#
# [12:32:41.100312280] # podman inspect ec90e9097fc6a98a560971fce4f97e75cc6e90afeeb9c4793c5a13ad304d9c42 --format {{(index .NetworkSettings.Networks "net-t5-euyssq6n").MacAddress}}
# [12:32:41.133997952] be:7a:a0:c5:08:60
#
# [12:32:41.139669793] # podman exec ec90e9097fc6a98a560971fce4f97e75cc6e90afeeb9c4793c5a13ad304d9c42 cat /etc/hosts /etc/resolv.conf
# [12:32:41.278332181] 127.0.0.1	localhost localhost.localdomain localhost4 localhost4.localdomain4
# ::1	localhost localhost.localdomain localhost6 localhost6.localdomain6
# 172.11.73.1	host.containers.internal host.docker.internal
# 172.11.73.2	ec90e9097fc6 stoic_cannon
# search us-east-2.compute.internal
# nameserver 172.11.73.1
#
# [12:32:41.284015513] # podman container checkpoint ec90e9097fc6a98a560971fce4f97e75cc6e90afeeb9c4793c5a13ad304d9c42
# [12:32:41.357133430] CRIU checkpointing failed -52.  Please check CRIU logfile /var/lib/containers/storage/overlay-containers/ec90e9097fc6a98a560971fce4f97e75cc6e90afeeb9c4793c5a13ad304d9c42/userdata/dump.log: Invalid exchange
# Error: `/usr/bin/crun checkpoint --image-path /var/lib/containers/storage/overlay-containers/ec90e9097fc6a98a560971fce4f97e75cc6e90afeeb9c4793c5a13ad304d9c42/userdata/checkpoint --work-path /var/lib/containers/storage/overlay-containers/ec90e9097fc6a98a560971fce4f97e75cc6e90afeeb9c4793c5a13ad304d9c42/userdata ec90e9097fc6a98a560971fce4f97e75cc6e90afeeb9c4793c5a13ad304d9c42` failed: exit status 1
# [12:32:41.360221952] [ rc=125 (** EXPECTED 0 **) ]
# #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
# #| FAIL: exit code is 125; expected 0
# #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# # [teardown]
not ok 6 [520] podman checkpoint/restore the latest container
# (from function `bail-now' in file /usr/share/podman/test/system/helpers.bash, line 187,
#  from function `die' in file /usr/share/podman/test/system/helpers.bash, line 970,
#  from function `run_podman' in file /usr/share/podman/test/system/helpers.bash, line 572,
#  in test file /usr/share/podman/test/system/520-checkpoint.bats, line 432)
#   `run_podman container checkpoint --latest' failed
#
# [12:32:41.873544793] # podman run -d quay.io/libpod/testimage:20241011 top
# [12:32:42.042898743] 9b4501d035183dbd21eaaf20d12404d7d8124c851636441008f040fed1c4a170
#
# [12:32:42.048453522] # podman container checkpoint --latest
# [12:32:42.111486045] CRIU checkpointing failed -52.  Please check CRIU logfile /var/lib/containers/storage/overlay-containers/9b4501d035183dbd21eaaf20d12404d7d8124c851636441008f040fed1c4a170/userdata/dump.log: Invalid exchange
# Error: `/usr/bin/crun checkpoint --image-path /var/lib/containers/storage/overlay-containers/9b4501d035183dbd21eaaf20d12404d7d8124c851636441008f040fed1c4a170/userdata/checkpoint --work-path /var/lib/containers/storage/overlay-containers/9b4501d035183dbd21eaaf20d12404d7d8124c851636441008f040fed1c4a170/userdata 9b4501d035183dbd21eaaf20d12404d7d8124c851636441008f040fed1c4a170` failed: exit status 1
# [12:32:42.114521185] [ rc=125 (** EXPECTED 0 **) ]
# #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
# #| FAIL: exit code is 125; expected 0
# #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@lsm5 lsm5 changed the title Investigate podman system test failures on centos-stream-10-x86_64 Investigate podman checkpoint test failures on centos-stream-10-x86_64 May 22, 2025
lsm5 added a commit to lsm5/crun that referenced this issue May 22, 2025
lsm5 added a commit to lsm5/crun that referenced this issue May 22, 2025
@kolyshkin
Copy link
Collaborator

Here's the criu failure (from dump.log):


(00.000000) Unable to get $HOME directory, local configuration file will not be used.
(00.000019) Version: 3.19 (gitid 0)
(00.000024) Running on lima-cs10 Linux 6.12.0-84.el10.x86_64 #1 SMP PREEMPT_DYNAMIC Tue May 13 13:39:02 UTC 2025 x86_64
(00.000033) File /run/criu/criu.kdat does not exist
(00.000177) sockets: Probing sock diag modules
(00.000210) sockets: Done probing
(00.000981) Pagemap is fully functional
(00.001129) Found anon-shmem device at 1
(00.001165) Hugetlb size 2 Mb is supported but cannot get dev's number
(00.001172) Reset 44450's dirty tracking
(00.001234)  ... done
(00.001268) Dirty track supported on kernel
(00.001311) Found task size of 7ffffffff000
(00.011021) net: Restoring netdev veth idx 10
(00.011278) net: Dumping netns links
(00.011295) net:        LD: Got link 1, type 772
(00.011298) net:        LD: Got link 10, type 1
(00.025677) vdso: Parsing at 7f57f27b0000 7f57f27b2000
(00.025687) vdso: PT_LOAD p_vaddr: 0
(00.025689) vdso: DT_HASH: 120
(00.025691) vdso: DT_GNU_HASH: 170
(00.025692) vdso: DT_STRTAB: 340
(00.025693) vdso: DT_SYMTAB: 1d8
(00.025694) vdso: DT_STRSZ: 9c
(00.025695) vdso: DT_SYMENT: 18
(00.025696) vdso: nbucket 3 nchain f bucket 7f57f27b0128 chain 7f57f27b0134
(00.025697) vdso: symbol __vdso_clock_gettime at address 1000
(00.025699) vdso: symbol __vdso_getcpu at address 1080
(00.025700) vdso: symbol __vdso_gettimeofday at address fc0
(00.025701) vdso: symbol __vdso_time at address fd0
(00.025702) vdso: symbol __kernel_sigreturn at address 0
(00.025703) vdso: symbol __kernel_rt_sigreturn at address 0
(00.025704) Error (criu/vdso.c:381): vdso: Unexpected rt vDSO area bounds
(00.025706) Error (criu/vdso.c:613): vdso: Failed to fill self vdso symtable
(00.025707) Error (criu/kerndat.c:1835): kerndat_vdso_fill_symtable failed when initializing kerndat.
(00.025808) Found mmap_min_addr 0x10000
(00.025846) files stat: fs/nr_open 1073741816

This is the same as checkpoint-restore/criu#2532, is caused by a newer kernel, and was fixed by checkpoint-restore/criu#2539 (December 2024). The fix made its way into criu 4.1 (released in March 2025).

For some reason, CentOS Stream 10 still has criu-3.19.

So, the fix is to either bump criu to 4.1, or backport checkpoint-restore/criu#2539.

@kolyshkin
Copy link
Collaborator

It's weird that no one has discovered it earlier.

@kolyshkin
Copy link
Collaborator

Ah, I guess this happened because of a recent backport of torvalds/linux@e93d252 to el10 kernel.

@kolyshkin
Copy link
Collaborator

Yes, this is listed in the changelog for the kernel 6.12.0-76.el10:

  • x86/vdso: Split virtual clock pages into dedicated mapping (Luiz Capitulino) [RHEL-78135]

@kolyshkin
Copy link
Collaborator

Filed an upstream bug: https://issues.redhat.com/browse/RHEL-93307

@rst0git
Copy link
Contributor

rst0git commented May 23, 2025

For some reason, CentOS Stream 10 still has criu-3.19.

I believe @adrianreber updated the CRIU package to v4.1 recently in
https://gitlab.com/redhat/centos-stream/rpms/criu/-/merge_requests/25

@lsm5
Copy link
Member Author

lsm5 commented May 26, 2025

It's weird that no one has discovered it earlier.

I'll consider that a win for Packit and TMT tests :) .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants