Skip to content

[Feature:Builds] forcePull should affect pulling builder images ForcePull test case execution docker #17596

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
0xmichalis opened this issue Dec 5, 2017 · 11 comments · Fixed by #18054
Assignees
Labels
kind/test-flake Categorizes issue or PR as related to test flakes. priority/P1 sig/developer-experience

Comments

@0xmichalis
Copy link
Contributor

/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/builds/forcepull.go:98
Expected error:
    <*util.ExitError | 0xc421317230>: {
        Cmd: "oc adm --config=/tmp/cluster-admin.kubeconfig --namespace=extended-test-forcepull-l9lm8-zs2kp policy remove-cluster-role-from-user system:build-strategy-custom extended-test-forcepull-l9lm8-zs2kp-user",
        StdErr: "Error from server (Conflict): Operation cannot be fulfilled on clusterrolebindings.authorization.openshift.io \"system:build-strategy-custom\": the object has been modified; please apply your changes to the latest version and try again",
        ExitError: {
            ProcessState: {
                pid: 18479,
                status: 256,
                rusage: {
                    Utime: {Sec: 0, Usec: 280068},
                    Stime: {Sec: 0, Usec: 33854},
                    Maxrss: 53748,
                    Ixrss: 0,
                    Idrss: 0,
                    Isrss: 0,
                    Minflt: 14580,
                    Majflt: 0,
                    Nswap: 0,
                    Inblock: 0,
                    Oublock: 0,
                    Msgsnd: 0,
                    Msgrcv: 0,
                    Nsignals: 0,
                    Nvcsw: 1483,
                    Nivcsw: 37,
                },
            },
            Stderr: nil,
        },
    }
    exit status 1
not to have occurred
/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/builds/forcepull.go:90

https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/17534/test_pull_request_origin_extended_conformance_gce/12371/

/sig developer-experience
/kind test-flake
/assign bparees

@bparees
Copy link
Contributor

bparees commented Dec 5, 2017

@jim-minter this seems likely to be a race condition since we're modifying the system wide cluster role to perform this test. We need to find a different way to do it.

@bparees
Copy link
Contributor

bparees commented Dec 5, 2017

actually I guess we're just modifying the user. A simple retry might be sufficient here, though I also kinda wonder if we should even be exposed to this or if RBAC should be handling it for us. @enj ?

we're running

oc adm --config=/tmp/cluster-admin.kubeconfig --namespace=extended-test-forcepull-l9lm8-zs2kp policy remove-cluster-role-from-user system:build-strategy-custom extended-test-forcepull-l9lm8-zs2kp-user"

and getting

"Error from server (Conflict): Operation cannot be fulfilled on clusterrolebindings.authorization.openshift.io \"system:build-strategy-custom\": the object has been modified; please apply your changes to the latest version and try again"

@bparees
Copy link
Contributor

bparees commented Dec 5, 2017

(Is the race here that our call has to update the system:build-strategy-custom object to add our user? If so, should oc adm policy be handling that via an apply instead?)

@bparees
Copy link
Contributor

bparees commented Dec 5, 2017

@deads2k ^^

@enj
Copy link
Contributor

enj commented Dec 7, 2017

@adelton @mrogers950 @simo5 any thoughts on adding some generic conflict handling to the policy commands? They already handle races with regards to creation of objects.

@simo5
Copy link
Contributor

simo5 commented Dec 7, 2017

@enj I have no objection if this is not going to cause unexpected (fail open) results

@tnozicka
Copy link
Contributor

@deads2k
Copy link
Contributor

deads2k commented Jan 8, 2018

(Is the race here that our call has to update the system:build-strategy-custom object to add our user? If so, should oc adm policy be handling that via an apply instead?)

I think we should try writing a patch first. If a patch won't work (I suspect compound merge keys don't work), then I'm ok retrying on a remove, but not on an add. For adds, we should start using the oc create clusterrolebinding

@deads2k
Copy link
Contributor

deads2k commented Jan 8, 2018

Oh, this is in a test? Don't change the command then. Go and update the test to create its own clusterrolebinding and delete its own clusterrolebinding. That's more in keeping with upstream and where we're going.

@bparees
Copy link
Contributor

bparees commented Jan 8, 2018

@deads2k it's in a test, but this behavior hurts everyone, why wouldn't we improve the behavior?

Go and update the test to create its own clusterrolebinding and delete its own clusterrolebinding. That's more in keeping with upstream and where we're going.

The test is running this command:

oc adm --config=/tmp/cluster-admin.kubeconfig --namespace=extended-test-forcepull-l9lm8-zs2kp policy remove-cluster-role-from-user system:build-strategy-custom extended-test-forcepull-l9lm8-zs2kp-user"

Switching the test to not use the commands we expect administrators to use, instead of fixing those commands so they can work in a parallel environment, seems user hostile to me.

@bparees
Copy link
Contributor

bparees commented Jan 9, 2018

then I'm ok retrying on a remove, but not on an add.

why does it matter? it's all race conditions. Whether you retry, or the admin happened to run the add slightly after someone else did the remove, either way the result is the same. If you want guarantees about who has what access in a policy, you need to either have only a single person making changes, or the changes need to be coordinated/ordered.

As soon as you have 2 different people running policy update commands independently(which is how you get the conflict in the first place), you've got no guarantee of the outcome whether or not the command retries conflicts itself or not.

Expecting a conflict error to catch the mistake you were about to make is a hail mary.

openshift-merge-robot added a commit that referenced this issue Jan 12, 2018
Automatic merge from submit-queue.

avoid conflict updating custom build role

fixes #17596

(I don't really consider this a proper fix, but apparently oc adm isn't going to get the proper fix so this is as close as we're getting).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/test-flake Categorizes issue or PR as related to test flakes. priority/P1 sig/developer-experience
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants