Skip to content

Add missing lock in Apply() #1668

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

vikaschoudhary16
Copy link
Contributor

@vikaschoudhary16 vikaschoudhary16 commented Dec 5, 2017

Cgroups joins and local variable 'path' updation should be
done under lock protection. Lock protection absence can lead
to race conditions and errors [1].

[1] https://ci.openshift.redhat.com/jenkins/job/test_pull_request_crio_e2e_rhel/433/consoleFull#199861745056cbb9a5e4b02b88ae8c2f77
/cc @sjenning
Signed-off-by: vikaschoudhary16 [email protected]

Cgroups joins and local variable 'path' updation should be
done under lock protection. Lock protection absence can lead
to race conditions and errors [1].

[1] https://ci.openshift.redhat.com/jenkins/job/test_pull_request_crio_e2e_rhel/433/consoleFull#199861745056cbb9a5e4b02b88ae8c2f77

Signed-off-by: vikaschoudhary16 <[email protected]>
@cyphar
Copy link
Member

cyphar commented Dec 5, 2017

Are you sure this patch fixes this bug? The error in that log is coming from the kernel (I looked at it when it was reported against cri-o) and is an ENODEV. I have a longer explanation in the relevant cri-o bug report.

@vikaschoudhary16
Copy link
Contributor Author

@cyphar I am not 100% sure that this is the fix for that bug. I saw your comment at cri-o issue. I am just guessing that it could be a fix and for verification created a test PR at cri-o:
cri-o/cri-o#1205

OTOH, Are you sure that this cant be the fix?

@cyphar
Copy link
Member

cyphar commented Dec 5, 2017

The error (ENODEV) is definitely coming from the kernel, and I'm not convinced that this patch is related. My theory is that this is a race against systemd (not in runc but against systemd proper). I'm not sure though, I'd want to see some reproducer that just uses runc.

@vikaschoudhary16
Copy link
Contributor Author

@cyphar If we go by your theory, this should be reproducible and the following should break sometime :

#! /bin/bash -e
for (( ; ; ))
do
  mkdir /sys/fs/cgroup/pids/foo
  ls /sys/fs/cgroup/pids/foo/cgroup.procs
  rmdir /sys/fs/cgroup/pids/foo
done

Right?

@cyphar
Copy link
Member

cyphar commented Dec 5, 2017

You'd have to write to cgroup.procs to get ENODEV, but possibly (you might need to do it in a multi-threaded way).

@vikaschoudhary16
Copy link
Contributor Author

@vikaschoudhary16
Copy link
Contributor Author

so this check dint help with cgroup flake.

This still needs to be verified. JJ is failing, which i shared above but from logs i see that runc is being used from origin and not from cri-o, where i have added the lock in the PR, for testing.

@Taeung
Copy link
Contributor

Taeung commented Jan 10, 2018

I thought the recent PR #1683 ( of which commit is d5b4a3e) already fixed the same problem(openshift/origin#16246),
so this PR is needed to be closed ?

@cyphar
Copy link
Member

cyphar commented Jan 10, 2018

Yes, I believe that #1683 is the correct fix for this issue. Closing.

@cyphar cyphar closed this Jan 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants