-
Notifications
You must be signed in to change notification settings - Fork 4.7k
flake: process_linux.go:291: setting cgroup config for ready process caused #13241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@derekwaynecarr @mrunalp @sjenning @openshift/networking Have we seen something like that before? This could just be the network tests and DIND, but would like to know for sure. |
@dcbw: Any thoughts? Thanks. |
There's some sort of dind/cgroup problem. These errors often show up with weird nested cgroup paths like Almost every time you start a pod under dind you get one or two of these:
But usually eventually it succeeds. Sometimes it doesn't. The bug never seems to happen in non-dind clusters. I'm guessing either
I tried to figure out why dind requires Probably we should just kill off the extended_networking_minimal test, and instead make the conformance_gce test run the (minimal) extended networking tests, since it sets up a multi-node environment running the sdn. |
#18540 fixes up the tests so we'd be able to do that if you'd like to review it. (Right now the default focus/skips assume that conformance-gce runs ovs-subnet, but that PR makes the skips be figured out at runtime based on whatever plugin is selected so that we can then change conformance-gce to run multitenant and then kill networking-minimal) |
Investigated this for a while... I believe the issue is due to the /proc/1/cgroup paths that a container using systemd as PID 1 has. For a DIND container we get:
Note how the same path is duplicated. Running 'docker -it nginx bash', results in:
and in fact, patching runc (eg, docker) to de-duplicate the cgroup path and using that patched docker inside the "node" container appears to prevent the problem from occurring.
So next step is tracking down why systemd-based PID1 containers have these odd cgroup paths. |
This sounds really familiar to something that derek tracked down previously.
…On Wed, Feb 28, 2018 at 1:37 PM, Dan Williams ***@***.***> wrote:
Investigated this for a while... I believe the issue is due to the
/proc/1/cgroup paths that a container using systemd as PID 1 has. For a
DIND container we get:
4:cpu,cpuacct:/system.slice/docker-10cdf0bdb77cc401a9ee9faba70ce3
6fc712cfd7399ea55439752a49bfe9d427.scope/system.slice/docker-
10cdf0bdb77cc401a9ee9faba70ce36fc712cfd7399ea55439752a49bfe9
d427.scope/init.scope
Note how the same path is duplicated. Running 'docker -it nginx bash',
results in:
4:cpu,cpuacct:/system.slice/docker-df9ecf01a95338243adfb587373ad7
781d5610e248e827287257463fda555c4c.scope
and in fact, patching runc (eg, docker) to de-duplicate the cgroup path
and using that patched docker inside the "node" container appears to
prevent the problem from occurring.
diff -up docker-4402c09586c72e0c32b90d72bd24304f609e2b7a/runc-1c91122c1d992cf1dc971ff14f78eddbf6fb06f5/libcontainer/cgroups/systemd/apply_systemd.go.foo docker-4402c09586c72e0c32b90d72bd24304f609e2b7a/runc-1c91122c1d992cf1dc971ff14f78eddbf6fb06f5/libcontainer/cgroups/systemd/apply_systemd.go
--- docker-4402c09586c72e0c32b90d72bd24304f609e2b7a/runc-1c91122c1d992cf1dc971ff14f78eddbf6fb06f5/libcontainer/cgroups/systemd/apply_systemd.go.foo 2018-02-28 09:44:39.060985054 -0600
+++ docker-4402c09586c72e0c32b90d72bd24304f609e2b7a/runc-1c91122c1d992cf1dc971ff14f78eddbf6fb06f5/libcontainer/cgroups/systemd/apply_systemd.go 2018-02-28 09:48:44.224528821 -0600
@@ -327,6 +327,38 @@ func ExpandSlice(slice string) (string,
return path, nil
}
+func reduce(a string) string {
+ a = strings.TrimSuffix(a, "/")
+ alen := len(a)
+ if alen % 2 != 0 {
+ return a
+ }
+ if a[0:alen/2] == a[alen/2:] {
+ return a[alen/2:]
+ }
+ return a
+}
+
func getSubsystemPath(c *configs.Cgroup, subsystem string) (string, error) {
mountpoint, err := cgroups.FindCgroupMountpoint(subsystem)
if err != nil {
@@ -340,6 +371,8 @@ func getSubsystemPath(c *configs.Cgroup,
// if pid 1 is systemd 226 or later, it will be in init.scope, not the root
initPath = strings.TrimSuffix(filepath.Clean(initPath), "init.scope")
+ initPath = reduce(initPath)
+
slice := "system.slice"
if c.Parent != "" {
slice = c.Parent
So next step is tracking down why systemd-based PID1 containers have these
odd cgroup paths.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#13241 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG_p_YF9Ha-y2sBREKdeiqq4IpKpW-Vks5tZZz7gaJpZM4MTnDe>
.
|
Actually, no, this was something we hit with docker in docker systemd when
we tried to run docker in a pod. So same use case. We didn't figure out
the reason, but it's definitely broken.
On Wed, Feb 28, 2018 at 4:42 PM, Clayton Coleman <[email protected]>
wrote:
… This sounds really familiar to something that derek tracked down
previously.
On Wed, Feb 28, 2018 at 1:37 PM, Dan Williams ***@***.***>
wrote:
> Investigated this for a while... I believe the issue is due to the
> /proc/1/cgroup paths that a container using systemd as PID 1 has. For a
> DIND container we get:
>
> 4:cpu,cpuacct:/system.slice/docker-10cdf0bdb77cc401a9ee9faba
> 70ce36fc712cfd7399ea55439752a49bfe9d427.scope/system.slice/d
> ocker-10cdf0bdb77cc401a9ee9faba70ce36fc712cfd7399ea55439752a
> 49bfe9d427.scope/init.scope
>
> Note how the same path is duplicated. Running 'docker -it nginx bash',
> results in:
>
> 4:cpu,cpuacct:/system.slice/docker-df9ecf01a95338243adfb5873
> 73ad7781d5610e248e827287257463fda555c4c.scope
>
> and in fact, patching runc (eg, docker) to de-duplicate the cgroup path
> and using that patched docker inside the "node" container appears to
> prevent the problem from occurring.
>
> diff -up docker-4402c09586c72e0c32b90d72bd24304f609e2b7a/runc-1c91122c1d992cf1dc971ff14f78eddbf6fb06f5/libcontainer/cgroups/systemd/apply_systemd.go.foo docker-4402c09586c72e0c32b90d72bd24304f609e2b7a/runc-1c91122c1d992cf1dc971ff14f78eddbf6fb06f5/libcontainer/cgroups/systemd/apply_systemd.go
> --- docker-4402c09586c72e0c32b90d72bd24304f609e2b7a/runc-1c91122c1d992cf1dc971ff14f78eddbf6fb06f5/libcontainer/cgroups/systemd/apply_systemd.go.foo 2018-02-28 09:44:39.060985054 -0600
> +++ docker-4402c09586c72e0c32b90d72bd24304f609e2b7a/runc-1c91122c1d992cf1dc971ff14f78eddbf6fb06f5/libcontainer/cgroups/systemd/apply_systemd.go 2018-02-28 09:48:44.224528821 -0600
> @@ -327,6 +327,38 @@ func ExpandSlice(slice string) (string,
> return path, nil
> }
>
> +func reduce(a string) string {
> + a = strings.TrimSuffix(a, "/")
> + alen := len(a)
> + if alen % 2 != 0 {
> + return a
> + }
> + if a[0:alen/2] == a[alen/2:] {
> + return a[alen/2:]
> + }
> + return a
> +}
> +
> func getSubsystemPath(c *configs.Cgroup, subsystem string) (string, error) {
> mountpoint, err := cgroups.FindCgroupMountpoint(subsystem)
> if err != nil {
> @@ -340,6 +371,8 @@ func getSubsystemPath(c *configs.Cgroup,
> // if pid 1 is systemd 226 or later, it will be in init.scope, not the root
> initPath = strings.TrimSuffix(filepath.Clean(initPath), "init.scope")
>
> + initPath = reduce(initPath)
> +
> slice := "system.slice"
> if c.Parent != "" {
> slice = c.Parent
>
> So next step is tracking down why systemd-based PID1 containers have
> these odd cgroup paths.
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> <#13241 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/ABG_p_YF9Ha-y2sBREKdeiqq4IpKpW-Vks5tZZz7gaJpZM4MTnDe>
> .
>
|
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
Update: things are fine in the container before pid1/systemd migrate from the initial cgroup to the "init.scope" cgroup. Thats when the duplicate paths appear. |
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
Rotten issues close after 30d of inactivity. Reopen the issue by commenting /close |
@openshift-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin_networking_future/840/consoleFull#201915086156cbb9a5e4b02b88ae8c2f77
The text was updated successfully, but these errors were encountered: