Skip to content

Cannot layer any package adding a user or group on Fedora 42 #5365

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
kuba3351 opened this issue Apr 16, 2025 · 35 comments · May be fixed by #5403
Open

Cannot layer any package adding a user or group on Fedora 42 #5365

kuba3351 opened this issue Apr 16, 2025 · 35 comments · May be fixed by #5403
Labels
client-layering Issues related to `rpm-ostree install/override` client side difficulty/hard hard complexity/difficutly issue priority/high regression This is a regression triaged This issue was triaged

Comments

@kuba3351
Copy link

Describe the bug

After an update to Fedora Atomic 42, I cannot execute any operation like update, install or uninstall package when I have layered packages.

Reproduction steps

  1. Update to Fedora 42 with layered packages
  2. Try to execute any operation like install/uninstall/update

Expected behavior

The operation should complete

Actual behavior

 jsierzega  /var/home/jsierzega  1  rpm-ostree update --uninstall nwg-shell
2 metadata, 0 content objects fetched; 788 B transferred in 4 seconds; 0 bajtów content written
Checking out tree b3b1c7a... done
Enabled rpm-md repositories: fedora-cisco-openh264 updates fedora updates-archive
Importing rpm-md... done
rpm-md repo 'fedora-cisco-openh264' (cached); generated: 2024-08-21T16:04:02Z solvables: 3
rpm-md repo 'updates' (cached); generated: 2025-04-16T03:02:59Z solvables: 4111
rpm-md repo 'fedora' (cached); generated: 2025-04-11T05:17:07Z solvables: 76879
rpm-md repo 'updates-archive' (cached); generated: 2025-04-16T03:49:34Z solvables: 3193
Resolving dependencies... done
Applying 2 overrides and 746 overlays
Processing packages... done
Running systemd-sysusers... done
Running pre scripts... done
Running post scripts... done
error: While applying overrides for pkg systemtap-runtime: Could not find group 'stapusr' in group file

The entries related to this package that I found in the logs shown by jornalctl -u rpm-ostreed:

kwi 16 16:37:18 NBJSIERZEGA rpm-ostree(systemtap-runtime.prein)[46153]: Creating group 'stapusr' with GID 156.
kwi 16 16:37:18 NBJSIERZEGA rpm-ostree(systemtap-runtime.prein)[46153]: Creating group 'stapsys' with GID 157.
kwi 16 16:37:18 NBJSIERZEGA rpm-ostree(systemtap-runtime.prein)[46153]: Creating group 'stapdev' with GID 158.
kwi 16 16:37:18 NBJSIERZEGA rpm-ostree(systemtap-runtime.prein)[46153]: Creating group 'stapunpriv' with GID 159.
kwi 16 16:37:18 NBJSIERZEGA rpm-ostree(systemtap-runtime.prein)[46153]: Creating user 'stapunpriv' (systemtap unprivileged user) with UID 159 and GID 159.
(...)
kwi 16 14:03:02 NBJSIERZEGA rpm-ostree(initscripts.post)[28527]: Created symlink '/etc/systemd/system/sysinit.target.wants/import-state.service' → '/usr/lib/systemd/system/import-state.service'.
kwi 16 14:03:02 NBJSIERZEGA rpm-ostree[26144]: Executed %post for initscripts in 186 ms
kwi 16 14:03:02 NBJSIERZEGA rpm-ostree[26144]: Executed %post for zfs-fuse in 180 ms
kwi 16 14:03:02 NBJSIERZEGA rpm-ostree[26144]: Txn UpdateDeployment on /org/projectatomic/rpmostree1/fedora failed: While applying overrides for pkg systemtap-runtime: Could not find group 'stapusr' in group file

System details

 root  /var/home/jsierzega  1  rpm-ostree --version
rpm-ostree:
 Version: '2025.7'
 Git: 35baf331666e4257c82bd33dbdcb24bfa00a0a90
 Features:
  - rust
  - compose
  - container
  - fedora-integration

The issue is also present with version 2025.6

 root  /var/home/jsierzega  rpm-ostree status -b
State: idle
BootedDeployment:
● fedora:fedora/42/x86_64/sericea
                  Version: 42.20250416.0 (2025-04-16T02:46:40Z)
               BaseCommit: b3b1c7aa72c12f3e069a97425d198dd4583d6354987de2d2ee758c0a06e6517f
             GPGSignature: Valid signature by B0F4950458F69E1150C6C5EDC8AC4916105EF944
      RemovedBasePackages: noopenh264 2.5.0-2.fc42 toolbox 0.1.1-3.fc42
          LayeredPackages: android-file-transfer android-tools ansible bat bats chromium clipman cloud-utils cmake cosmic-session
                           diffstat distrobox dnf doxygen edk2-ovmf fastfetch gettext git git-credential-libsecret glab gparted hdparm
                           htop hyprland kcat kubernetes langpacks-pl libffi-devel libnsl libvirt lxterminal maven meson moby-engine
                           mousepad mozilla-openh264 ncdu ngrep nmap nnn nodejs npm nwg-shell openssl p7zip pamixer patch patchutils pip
                           podman-compose postgresql python-sphinx python-sphinx_rtd_theme python3-bcc python3-devel python3-pip
                           python3-wxpython4 qemu ranger rdesktop remmina ruqola strace subversion swtpm swtpm-tools syslinux tcpdump
                           tcptrack thefuck thunderbird tmux traceroute virt-manager w3m xorriso yq yum-utils
                Initramfs: regenerate
                 Unlocked: development

Additional information

No response

@jeckersb
Copy link
Collaborator

I had this same issue when trying to use F42/Cosmic, except for me it was the openvpn user. I didn't spend any time looking into it though.

@miabbott
Copy link
Member

This was similarly reported in on the Silverblue tracker - fedora-silverblue/issue-tracker#643

We believe it is an issue with how systemtap is configuring users/groups - https://bugzilla.redhat.com/show_bug.cgi?id=2359764

@jeckersb I'd suggest looking at the openvpn spec file and see if it is using systemd-sysusers correctly

@bitestring
Copy link

bitestring commented Apr 17, 2025

This is a bad update. Fedora 42 Atomic variants can no longer override (which we must do due to systemd-remount-fs.service failure) or update the system to new version. I am facing the same issue as everyone else

error: While applying overrides for pkg systemtap-runtime: Could not find group 'stapusr' in group file

I understand that this is an upstream issue. But shouldn't it be tested and coordinated with upstream before releasing newer version of rpm-ostree which breaks a critical functionality like system update?

It would be great if a workaround is suggested until a fix is made by upstream.

@kuba3351
Copy link
Author

For me, removing the layered qemu package fixes the issue, so the issue description is wrong, not all packages causing the issue, but I am suprised that this issue was not spotted in beta testing.

Good thing we can rollback the update thanks to the underlying technologies

@miabbott
Copy link
Member

miabbott commented Apr 17, 2025

This is a bad update. Fedora 42 Atomic variants can no longer override (which we must do due to systemd-remount-fs.service failure) or update the system to new version. I am facing the same issue as everyone else

If the systemd-remount-fs issue is a problem, there is a Common Issue about it with a workaround:

https://discussion.fedoraproject.org/t/root-mount-options-are-ignored-in-fedora-atomic-desktops-42/148562

I understand that this is an upstream issue. But shouldn't it be tested and coordinated with upstream before releasing newer version of rpm-ostree which breaks a critical functionality like system update?

I understand that this is problem is frustrating, howerver I think that testing that every package in Fedora can be successfully layered on top of Silverblue is a large (and possibly unreasonable) ask.

Please remember that there are humans on the other end of these issues that are trying their best in their free time to keep all these projects successful.

If you want to help out towards the success of these projects, you can provide valuable feedback by testing your use cases on Rawhide or even as part of the next Beta.

@bitestring
Copy link

bitestring commented Apr 22, 2025

If the systemd-remount-fs issue is a problem, there is a Common Issue about it with a workaround:
https://discussion.fedoraproject.org/t/root-mount-options-are-ignored-in-fedora-atomic-desktops-42/148562

No, this won't work due to this sysusers issue.

Please remember that there are humans on the other end of these issues that are trying their best in their free time to keep all these projects successful.

I didn't mean in a negative way. Of course I understand all the thankless contributions and efforts behind the project. But this is not the first time rpm-ostree has broken system update. Any application or package specific issue, users can workaround. But breaking system update itself in a final release should have been atleast mentioned in release notes. There are many packages in fedora that uses sysusers (I do not have deep knowledge on that). So rpm-ostree cannot install those packages or allow updating/overriding the system with those packages layered. QEMU is a very commonly used package in Fedora community especially in Atomic variants. So maybe it should have been handled before the release.

Anyway I ended up doing rpm-ostree reset and creating a rootful distrobox container and installed virt-manager there. Works like charm.

distrobox create --root --image quay.io/fedora/fedora-toolbox:42  --name fedora-toolbx-virt-manager --init

If you want to help out towards the success of these projects, you can provide valuable feedback by testing your use cases on Rawhide or even as part of the next Beta.

I have done in the past and would be very happy to report in future as well.

@StarkZarn
Copy link

This is a bad update. Fedora 42 Atomic variants can no longer override (which we must do due to systemd-remount-fs.service failure) or update the system to new version. I am facing the same issue as everyone else

error: While applying overrides for pkg systemtap-runtime: Could not find group 'stapusr' in group file

I understand that this is an upstream issue. But shouldn't it be tested and coordinated with upstream before releasing newer version of rpm-ostree which breaks a critical functionality like system update?

It would be great if a workaround is suggested until a fix is made by upstream.

I'm facing this issue when trying to rebase from 41 to 42 kinoite with layered packages. Is there a fundamental workflow change that's supposed to happen here? It isn't addressed in any documentation, as far as I could find, so I would call this an issue still.

Any advice?

@carpediem29
Copy link

I wonder if creating the missing group manually (through newgrp) can solve the issue untll https://bugzilla.redhat.com/show_bug.cgi?id=2359764 is solved. DId anybody test that ?

@StarkZarn
Copy link

I wonder if creating the missing group manually (through newgrp) can solve the issue untll https://bugzilla.redhat.com/show_bug.cgi?id=2359764 is solved. DId anybody test that ?

I tried pulling the entry from /usr/lib/group and putting it into /etc/group to no avail.

I'm not in a place where it feels worth nuking my layered packages to upgrade at this point. If I have to start over, I'll end up on openSuse.

@carpediem29
Copy link

I wonder what is wrong with https://src.fedoraproject.org/rpms/systemtap/blob/rawhide/f/systemtap.spec
/lib/group includes indeed a stapusr group on Fedora 42 - but no stapusr user exist :

# grep sta  /lib/group /lib/passwd 
/lib/group:stapusr:x:156:
/lib/group:stapsys:x:157:
/lib/group:stapdev:x:158:
/lib/group:stapunpriv:x:159:stapunpriv
/lib/passwd:stapunpriv:x:159:159:systemtap unprivileged user:/var/lib/stapunpriv:/sbin/nologin

Hope someone will fix it soon... No ability to deploy security fixes because of a specific package issue for a bit more than 2 weeks is a concern ...

@mihalyr
Copy link

mihalyr commented May 12, 2025

This might not be only an F42 issue, I see the same problem on F41 Sericea, it seems to be rpm-ostree related as I recently upgraded to the 2025.7 as well. My issue is with the Wireshark package for which I had to copy the wireshark group from /usr/lib/group to /etc/group, now I can't install any other package it seems.

@cgwalters cgwalters marked this as a duplicate of ostreedev/ostree#3417 May 12, 2025
@djfjeff
Copy link

djfjeff commented May 12, 2025

The issue is still present for me as well trying to upgrade from Silverblue 41 to 42... The newest systemtap packages (5.3) did not solve the issue.

@bitestring
Copy link

bitestring commented May 12, 2025

This is not upstream issue and has been reassigned back to rpm-ostree package.

Please check https://bugzilla.redhat.com/show_bug.cgi?id=2359764

It's not just systemtap, but also wireshark, openvpn

Wireshark:
fedora-silverblue/issue-tracker#50

OpenVPN:
ostreedev/ostree#3417

While bugs are expected for an experimental software like Silverblue (and Atomic variants), these new platforms must not be claimed as production ready, as evident by multiple bugs throughout the releases. I have been using Silverblue maybe since it's inception. But the problem is it is advertised in fedoraproject.org as if it is stable. So users expect Silverblue to be stable like Workstation which is clearly not the case.

Hence I propose to mark Atomic variants as "Alpha" or "Experimental" so users clearly know what kind of workloads can be put on them.

Edit: Other ways to install virt-manager seems very complicated or not working as expected. Hence I have moved back to good old Workstation as my work depends on virt-manager. I hope Atomic variants gets stable and polished like the mainstream Workstation. But It was a good ride for years. Thanks to all the contributors.

@millerthegorilla
Copy link

millerthegorilla commented May 14, 2025

I am unable to install openvpn with

error: While applying overrides for pkg openvpn: Could not find group 'openvpn' in group file

on coreos 42. Is there a workaround?

Is there likely to be a fix at some point soon?

@bitestring ... I have been using Silverblue since its inception, and it has been mostly rocksolid. The issues that it faces from time to time are not because silverblue or other atomic hosts like coreos are in some sort of 'alpha' state. Those issues are specific to the atomic host variant and are simply issues that are going to be encountered from time to time.
The move to systemd-sysusers is non trivial, and was bound to cause some issues.
It would have been nice if this particular issue should have been spotted whilst in rawhide, but as stated above, not every package can be checked.

I use coreos in production and it serves me extremely well, with this current issue, as I am unable to install openvpn, being only the second failure since I began using silverblue and coreos, when they were first released.

That this issue is interrupting production servers means that it should be made a priority. In the meantime I am going to have to try reverting to fedora 41, which is not really acceptable, but there seems to be no other choice.

@mihalyr
Copy link

mihalyr commented May 14, 2025 via email

@millerthegorilla
Copy link

I just installed openvpn successfully on my silverblue 42 machine. It required an --allow-inactive to install it though.

@millerthegorilla
Copy link

@mihalyr I have also just installed openvpn succesfully on the most recent coreos f41.

@millerthegorilla
Copy link

@mihalyr but once installed, I am unable to install anything else or update. My silverblue 42 install is fine but I think openvpn is in the base image.

I am thinking I might learn how to use ostree compose or coreos-assembler to make a base coreos image with openvpn already layered.

@jlebon is there any update on this issue? Is there a github issue number or a bugzilla case?

@millerthegorilla
Copy link

There doesn't seem to be much investigation of this issue going on. I can't find any real evidence of it affecting a large number of people nor can I raise any support from maintainers.

So perhaps it is an edge case, only affecting a very small amount of people?

@djfjeff
Copy link

djfjeff commented May 16, 2025

No it is not, I have seen numerous reports of folks not able to upgrade to Silverblue 42 because of systemtap. The latest release (5.3) of systemtap should have solve the issue but it still remains on my end.

@millerthegorilla
Copy link

millerthegorilla commented May 16, 2025 via email

@travier travier changed the title Cannot layer any package on Fedora 42 Cannot layer any package adding a user or group on Fedora 42 May 16, 2025
@Procsiab
Copy link

Hello there, I am chiming in to report that the same issue seems to affect the nut package (Network UPS Tools):

The behavior I can observe while updating a machine with the nut package installed is similar as what was described previously, with an RPM-OStree deploy of version 42.20250517.0 ending with:

error: While applying overrides for pkg nut-client: Could not find group 'nut' in group file

The machine experiencing this is running Fedora IoT 42.20250423.0, which was deployed successfully coming from 42.20250414.0 - in both upgrades, the nut package was already layered before.

@lostgradient
Copy link

I think this affects many packages that have migrated to sysusers.
In my case, layering ddclient-4.0.0-3.fc42 produced a similar error:

While applying overrides for pkg ddclient: Could not find user 'ddclient' in passwd file

But ddclient-4.0.0-1.fc42 can be layerd without any problem.

The differences of the two packages are the following two commits, which are basically snippets to migrate to sysusers:
https://src.fedoraproject.org/rpms/ddclient/c/15504eddf02f6f098704689ec78df7278ca1b114
https://src.fedoraproject.org/rpms/ddclient/c/fde29b6f78189de3c98750ee7b34b3b02b5157e8

@millerthegorilla
Copy link

millerthegorilla commented May 18, 2025

I am guessing that because this issue is only affecting some packages and not all, that the package maintainers have to make adjustments to their installation pre and post scripts to allow them to work with immutable systems now that sysusers has been implemented.
It's obviously something more than just adding a systemd-sysusers configuration snippet.
Perhaps looking at the scripts of the affected packages for commonalities might help...

@mrnerdhair
Copy link

mrnerdhair commented May 19, 2025

I'm experiencing this same issue with nut, nut-client, and nut-xml versions 2.8.2.1-5.1.git20240703pr2505 and later.

error: While applying overrides for pkg nut-client: Could not find group 'nut' in group file

I expect this is a particularly tricky issue to collect feedback on, as there's not much indication that it has to do with rpm-ostree itself; it took me quite a bit of time to find this issue, as I assumed it was simply a packaging problem.

@millerthegorilla
Copy link

I second how difficult it is to find relevant bug reports/issues.
Given that it is a sincerely catastrophic bug, you would think that it would be being actioned publicly, but there doesn't seem to be any activity at all.
The bug reports/issues I have found have all been closed as solved,.despite the fact that the issue persists.

I heartily recommend that anyone who is experiencing this bug opens an issue at fedora bugzilla.

@M1cha
Copy link

M1cha commented May 21, 2025

I'm on 42.20250410.3.1 and can't update to 42.20250427.3.0 due to this issue. That's VERY unfortunate timing in face of the new Intel CPU bug which needs a microcode update.

● fedora:fedora/x86_64/coreos/stable
                  Version: 42.20250410.3.1 (2025-04-28T23:24:51Z)
               BaseCommit: e057a84658cd114c1fa8a944d8fe3de93f5413c0fbe3f1fabc737cc04bbd749b
             GPGSignature: Valid signature by B0F4950458F69E1150C6C5EDC8AC4916105EF944
      RemovedBasePackages: nfs-utils-coreos 1:2.8.2-1.rc8.fc42
          LayeredPackages: dmidecode efivar hdparm htop inotify-tools libvirt lm_sensors nfs-utils powertop qemu s-tui stress tcpdump udisks2
                           usbutils virt-install
                Initramfs: --add-drivers vfio-pci
May 21 19:13:06 homeserver zincati[4615]: [INFO  zincati::update_agent::actor] target release '42.20250427.3.0' selected, proceeding to stage it
May 21 19:13:14 homeserver zincati[4615]: [ERROR zincati::update_agent::actor] failed to stage deployment: rpm-ostree deploy failed:
May 21 19:13:14 homeserver zincati[4615]:     error: While applying overrides for pkg systemtap-runtime: Could not find group 'stapusr' in group file

M1cha added a commit to M1cha/homeserver that referenced this issue May 24, 2025
rpm-ostree has issues with installing qemu now:
coreos/rpm-ostree#5365
@cgwalters
Copy link
Member

cgwalters commented May 24, 2025

OK I'm looking at this now. The sysusers rework in Fedora 42 I think was a generally good idea, but it uncovered a lot of technical debt in this area in rpm-ostree.

As for why it works for some packages but not others, I think it basically only errors if the package has content owned by the dynamic user/group. For example with openvpn:

$ rpm -qplv ./openvpn-2.6.14-1.fc42.x86_64.rpm 
...
drwxr-xr-x    2 root     root                        0 Apr  1 20:00 /etc/openvpn
drwxr-x---    2 root     openvpn                     0 Apr  1 20:00 /etc/openvpn/client

Now in general of course this situation is exactly the one being debated heavily in various places (xref bootc-dev/bootc#1263 and the fedora-devel thread)


From what I can see initially what's happening here is that the new users are being added to /etc/passwd, but we end up subsequently heading down the layering path where we do what we've always done in this case and make /usr/lib/passwd temporarily be /etc/passwd, masking the newly added users. We probably need to change things so that we merge instead.

Further findings:

  • This all likely works for base image builds because we don't have any usr/lib/passwd files, so we keep etc/passwd in place.
  • Offhand...I think what may work here is to change run_sysusers to take self->passwd_dir so the sysusers invocation always mutates the merge deployment.

@cgwalters cgwalters added priority/high triaged This issue was triaged client-layering Issues related to `rpm-ostree install/override` client side regression This is a regression difficulty/hard hard complexity/difficutly issue labels May 24, 2025
@cgwalters
Copy link
Member

Some preparatory work in #5398

I just want to reiterate a finding from earlier that took me a while to understand: Since #5334 the "pure systemd-sysusers" case works for base image builds solely because there's no altfiles setup at the time, and then later we make all users move to altfiles.

For client layering it's a super complex problem though. With classic useradd/groupadd, altfiles is really the only choice, because (unlike sysusers) there's no "reconcile on boot" case.

The key question here I think is: in the package layering flow, do we treat sysusers as another way to extend nss-altfiles, or do we try to keep the users in /etc?

Keeping them in /etc and hence shrinking the role of altfiles a lot of appeal. But fundamentally we need to bear in mind the reason altfiles was introduced in the first place, which is handling the case where /etc/passwd is modified on the client. In that case, the passwd entries from system users will not be visible by default, so systemd-sysusers will run again on the reboot into the new root, and could pick different uids (in theory).

The best fix I think would be to only use sysusers ➡ altfiles for users that own content in the target root. We'd then discard the other sysusers entries, ensuring that they get created on the client system instead as usual (because they don't need to be lifecycle bound to the filesystem tree).

And if we go down that route, it becomes most obvious to implement things the same way librpm does, by parsing the sysusers content out of the Provides or so? Or at least to start things that way; we'd still need to move the content to altfiles.

However there's yet another quite different alternative: We could allow systemd-sysusers to mutate the live (booted) /etc/passwd in the layering flow. The huge benefit of this is again that it heads towards shrinking the role of altfiles. But the downside is that for the first time, queuing something related to an OS upgrade would be mutating the running system, and further that mutation inherently leaks state - we would no longer be removing the users/groups added this way when layered packages are removed. OTOH of course, if content in /etc or /var sticks around after removal, it is probably best to keep the users allocated anyways.

@keszybz
Copy link
Contributor

keszybz commented May 27, 2025

We can distinguish packages that have "owned files" by looking at rpm metadata. If Requires:user(...) is present, package has files owned by user. If Requires:group(...) is present, package has files owned by the group or has a user belonging to that group. So this should be relatively straightforward to implement.

@jlebon
Copy link
Member

jlebon commented May 27, 2025

The best fix I think would be to only use sysusers ➡ altfiles for users that own content in the target root. We'd then discard the other sysusers entries, ensuring that they get created on the client system instead as usual (because they don't need to be lifecycle bound to the filesystem tree).

What if a package starts off not owning content and then owning it? We'd have to switch it to altfiles, but client-side /etc will already have picked a (potentially different) UID.

The key question here I think is: in the package layering flow, do we treat sysusers as another way to extend nss-altfiles, or do we try to keep the users in /etc?

I think... as annoying as it is, my vote is to just keep extending nss-altfiles for consistency. It's how the server-side works and keeping that unified core symmetry is valuable.

@millerthegorilla
Copy link

It might not be entirely relevant to the issue at hand but a problem I had with nss-altfiles is that often I would add a user to a group using usermod -aG group user on behalf of the requirements of some program and the program would then check /etc and complain that the group was not found in /etc, much like the error that is received currently. I think that nss-altfiles was hooked into the usermod command, finding the group in altfiles, and so not adding the group to /etc. The user/group wouldn't be added to /etc even if it didn't exist in altfiles (/usr/lib).

I can't claim to understand the complexities of the current situation but if nss-altfiles continues to be the solution does that mean that I am going to have to continue to use 'vigrp' and manually edit the /etc file to satisfy requirements of some programs that don't know about nss-altfiles?

As an end user I would much prefer to get anticipated consequences of the usermod command. The current issue may be a different one altogether, handling only installation configuration, but if there is an opportunity to restore a compliant usermod command then I would be super happy.

@cgwalters
Copy link
Member

Hi @millerthegorilla yes it is a relevant point - if we continued to add to nss-altfiles we'd continue to have that issue for groups like libvirt and docker. On the other hand, we wouldn't have the risk of new bugs.

What if a package starts off not owning content and then owning it? We'd have to switch it to altfiles, but client-side /etc will already have picked a (potentially different) UID.

Why would we have to switch it to altfiles?

It's how the server-side works and keeping that unified core symmetry is valuable.

OK. I am really on the fence myself. The contrary argument is that mutating the live /etc is how librpm does it today, and this would be pulling the layering flow more towards that.

@jlebon
Copy link
Member

jlebon commented May 27, 2025

What if a package starts off not owning content and then owning it? We'd have to switch it to altfiles, but client-side /etc will already have picked a (potentially different) UID.

Why would we have to switch it to altfiles?

Because there would be content in the commit that references it.

@millerthegorilla
Copy link

The prospect of leaking state presented by mutating the live tree is one issue, but as an end user interested in security, I would far prefer for installed packages to keep their groups in an unwritable location, ie /usr/lib than the potentially writable in comparison /etc.

This might not make much sense, as if root were pwned then the system would be compromised beyond repair, but I like the potential security offered by an immutable file system, it's why I chose coreos, and to place groups in /etc would begin to diminish that.

As for the issue of programs that are unware of nss-altfiles missing a group in /etc could /etc/group be a symbolic link to /usr/lib (ridiculous perhaps) or nss-altfiles or similar be developed to add all groups in /usr/lib/ to /etc/group?

The only solution I had for the earlier nss-altfiles issue was to manually add the group/user to /etc and the net result was that the group/user was in both /usr/lib and /etc simultaneously. It had no problematic side effects.

I am probably exposing my lack of understanding of the issue though...

cgwalters added a commit that referenced this issue May 27, 2025
jlebon added a commit to jlebon/rpm-ostree that referenced this issue May 27, 2025
For now, we'll just treat sysusers entries from RPM packages like we do
scriptlets that `useradd`/`groupadd`; that is, we want them to happen
at compose time and go into altfiles in case those same sysusers own
content in the commit.

All we need to do to make that happen is to run `systemd-sysusers`
_after_ we do the `/etc/passwd` <--> `/usr/lib/passwd` switcheroo so
that the new entries go into what will become `/usr/lib/passwd`.

And all we need to do that is to just move down the sysusers execution
a bit.

Fixes: coreos#5365
jlebon added a commit to jlebon/rpm-ostree that referenced this issue May 27, 2025
I noticed while hacking on coreos#5365 that during the rpmdb writing, librpm
was actually re-executing systemd-sysusers *from the host context* which
is not at all what we want.

Apparently, `RPMTRANS_FLAG_JUSTDB` doesn't imply this and we need to
explicitly also pass `RPMTRANS_FLAG_NOSYSUSERS`. That flag doesn't exist
in el9, so add a compile-time conditional for it.

This fixes the issue for new systems, but people who have upgraded
to f42 and overlaid packages with sysusers entries will have new entries
in `/etc/passwd` and `/etc/group` files because of this. And this can
cause problems now if the UIDs chosen were different because the `/etc`
entries will take precedence over nss-altfiles even though owned content
will match nss-altfiles.

In practice, I think since coreos#5365 breaks exactly those use cases
where the sysusers entries own content, we don't have to worry about
that subcase. But for sysusers entries that _don't_ own content, the
transaction would go through and so there could still be UID conflicts
there.

I guess we'll need to figure out if to somehow try to fix this or just
issue a PSA about it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
client-layering Issues related to `rpm-ostree install/override` client side difficulty/hard hard complexity/difficutly issue priority/high regression This is a regression triaged This issue was triaged
Projects
None yet
Development

Successfully merging a pull request may close this issue.