-
Notifications
You must be signed in to change notification settings - Fork 162
manifests/fedora-coreos-base: apply dracut fix in postcript #3508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: testing-devel
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add review here in a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the long wait for the review here. I wanted to put some of the groundwork in place for you to be able to implement my suggestion.
This patch is something we want to carry in FCOS and RHCOS and is also something that we foresee we will drop progressively over time as it enters our various streams (i.e. we'll drop it from rawhide
first, then next-devel
/next
, eventually we'll drop it from el10 and maybe never drop it from el9??
Anyway we need to set ourselves up here for success. Let's move this into a manifest that is shared with el9/10.
I should have made this easier with #3511 If you don't mind review that and test it out (both an FCOS build and downstream el9/10 build would be useful).
eba1a79
to
9737c5c
Compare
Understood, I'll rebase my PR on top of #3511 and put the post-script in |
9737c5c
to
2e42d95
Compare
We are awaiting some dracut patches to land in Fedora mirror for [1]. We could wait for it, but we'd need it also for older rhcos branches and those branches won't have the new dracut. So we decided to carry part of the dracut-ng patch [2] in a post-script. This will be backported in older rhcos branches, and will be reverted on testing-devel once new dracut 107 available in mirror. [1] coreos/fedora-coreos-tracker#1937 [2] dracut-ng/dracut-ng#1306
2e42d95
to
9d804c9
Compare
Can we add some text to the body of the 2nd commit. A nice description of why we can re-enable this test would be useful. |
CI failures here:
which is interesting. @jlebon would you have expected multipath regressions as a result of dracut-ng/dracut-ng#1306 ? |
I think it's more that the test needs to be adapted so it doesn't verify that we're on multipath based on whether the device is Aside: a lot (probably all) of those multipath tests could nowadays be moved to external tests. |
That seems reasonable, but is it possible here we're going to break people who somehow made assumptions based on what the device names were in the past? |
I would love that |
This test can be re-enable as we are applying part of the dracut-ng patch [1] in a postprocess script. This script is a workaround solution while awaiting the bump of dracut RPM NVR including [1] in Fedora, and will be removed afterward. [1] dracut-ng/dracut-ng#1306
Done in last patch set.
I proposed a PR to do so coreos/coreos-assembler#4113 (i.e still in go tests though). With my cosa PR, it's passing the verification step, but still fails while rebooting:
|
4344bd0
to
0f9e91c
Compare
@jlebon with |
The problem is that those names were not stable to begin with and so not recommended (in favour of the WWID, but the problem with that is that it's node-specific). I guess it is reliable if you only have a single multipathed device (in which case you're guaranteed I think to get That said, a really important factor is that upgrading nodes will not get this change because the That said, there's definitely an argument to be made for not backporting this change because it's too significant to change mid-release (even if it would only take effect with new boot media). I think the workaround if you don't want friendly names then on those older releases is to add an cc @bmarzins in case I got some details wrong. |
Yes, indeed. Which... I think those previously fell under:
But at the same time, in realistic scenarios you very often do have multiple multpath devices and our provisioning flows should be able to handle those cases without having to jump through hoops. I.e. those should not be treated in a special way. Anyway, arguing with myself a bit here but I think this is still the right change to make overall even though it might/likely will cause discomfort. |
so that means we need to update all the uses of
Is that correct? |
So should we kill #3506 (the backport to 4.18 that we were holding up to pull in this patch) then? Note that what we land here will slip straight into |
That depends on what you mean when you say "stable". Multipath stores the mapping between user_friendly_name and WWID in /etc/multipath/bindings, and once there is a mapping there, multipath will always use it unless there already is a device with that name. That's actually the origin of the problem that dracut-ng/dracut-ng#1306 is trying to fix. Since there is no /etc/multipath/bindings in the initramfs, the devices get their user_friendly_name there by the order that they are discovered. However once you switch-root, multipath can't always switch a device to the user_friendly_name it is supposed to have, since it might be in use by another device. In this case it will fail back to using the WWID as the name. As long as you only have one multipath device, it will always be named mpatha. If you later add another device, the original device will still always be named mpatha, unless the other device gets named mpatha in the initramfs, in which case the original device will be named by its WWID for that boot.
Since the multipath user_friendly_names are supposed to be persistent (in RHEL, we tell users that they must remake their initramfs whenever a new multipath device is added, to propagate /etc/multipath/bindings to the initramfs and avoid the name mismatch issue) users may well use that specific name in their settings. As long as you don't change /etc/multipath.conf on existing nodes, you should be safe. But having multipath devices suddenly named differently on new nodes may well confuse users, so I'm not sure that you should make this change mid-release. |
@bmarzins Thanks for the additional context! We actually do have facilities to inject back the bindings into the initramfs as well. Though I'm not sure of the payoff of trying to wire that through in all the right places. It just seems to me like user friendly names should be avoided in general anyway.
In discussing this with @dustymabe, we realized it's a little more nuanced than that. Upgrading nodes will indeed have a carried over However, before this latest push to drop the So in practice, there should be no or few systems using
Yeah, let's not backport it.
Yeah, I think doing this in 4.19+ is fine. We're fast-tracking a patch which will land in 9.6 anyway eventually. |
We are awaiting some dracut patches to land in Fedora mirror for [1]. We could wait for it, but we'd need it also for older rhcos branches and those branches won't have the new dracut. So we decided to carry part of the dracut-ng patch [2] in a post-script. This will be backported in older rhcos branches, and will be reverted on testing-devel once new dracut 107 available in mirror.
[1] coreos/fedora-coreos-tracker#1937
[2] dracut-ng/dracut-ng#1306