Skip to content

FailedMount and FailedSync events related to builder and deployer tokens #14383

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jwforres opened this issue May 26, 2017 · 17 comments
Closed
Assignees
Labels
component/kubernetes kind/bug Categorizes issue or PR as related to a bug. priority/P2

Comments

@jwforres
Copy link
Member

We are frequently seeing errors related to mounting secrets as volumes. Commonly appears for the builder-token and deployer-token. Possible this is something only happening on docker for mac. Reproduced by myself and @spadgett so far.

LASTSEEN   FIRSTSEEN   COUNT     NAME            KIND                    SUBOBJECT                     TYPE      REASON              SOURCE                        MESSAGE
1h         1h          1         test-2-build    Pod                                                   Warning   FailedMount         kubelet, localhost            Unable to mount volumes for pod "test-2-build_myproject(3dc52cca-4242-11e7-8550-ce8c6a9659b1)": timeout expired waiting for volumes to attach/mount for pod "myproject"/"test-2-build". list of unattached/unmounted volumes=[docker-socket builder-dockercfg-bsf9q-push builder-token-xpkt3]
1h         1h          1         test-2-build    Pod                                                   Warning   FailedSync          kubelet, localhost            Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "myproject"/"test-2-build". list of unattached/unmounted volumes=[docker-socket builder-dockercfg-bsf9q-push builder-token-xpkt3]
1h         1h          1         test-2-deploy   Pod                                                   Warning   FailedMount         kubelet, localhost            Unable to mount volumes for pod "test-2-deploy_myproject(7c6989e5-4242-11e7-8550-ce8c6a9659b1)": timeout expired waiting for volumes to attach/mount for pod "myproject"/"test-2-deploy". list of unattached/unmounted volumes=[deployer-token-900cs]
1h         1h          1         test-2-deploy   Pod                                                   Warning   FailedSync          kubelet, localhost            Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "myproject"/"test-2-deploy". list of unattached/unmounted volumes=[deployer-token-900cs]
@jwforres
Copy link
Member Author

cc @smarterclayton

@smarterclayton
Copy link
Contributor

@derekwaynecarr @childsb

@pweil- pweil- added component/kubernetes kind/bug Categorizes issue or PR as related to a bug. priority/P1 labels May 30, 2017
@derekwaynecarr
Copy link
Member

@smarterclayton this looks similar to what @sjenning and i were debugging earlier this week.

@derekwaynecarr
Copy link
Member

we think the problem was introduced in the kube 1.6.1 bump, need to isolate specific commit id.

@sjenning
Copy link
Contributor

sjenning commented Jul 5, 2017

b282780 is known good and 359f62c is known bad. The commits in between don't build :-/

 $ git log --oneline --no-merges v3.6.0-alpha.1..359f62ccfb
vendor/k8s.io/kubernetes/
359f62ccfb (HEAD) UPSTREAM: 44970: CRI: Fix StopContainer timeout
dc20aea9e9 UPSTREAM: 44730: Check for terminating Pod prior to
launching successor in StatefulSet
6de438291c UPSTREAM: <drop>: disable apiserver loopback loop in generic context
566b41bffa UPSTREAM: 00000: make AsVersionedObjects default cleanly
3c064ffa17 UPSTREAM: <carry>: openapi test, patch in updated package name
e5fa6896b8 UPSTREAM: 44221: validateClusterInfo: use clientcmdapi.NewCluster()
91525e786e UPSTREAM: 44570: Explicit namespace from kubeconfig should
override in-cluster config
62d77ec05c UPSTREAM: <carry>: add OpenShift resources to garbage
collector ignore list
e51a861e52 UPSTREAM: 44859: e2e: handle nil ReplicaSet in
checkDeploymentRevision
6b12679660 UPSTREAM: 44861: NotRegisteredErr for known kinds not
registered in target GV
3adf4d3895 UPSTREAM: <drop>: Run make-update – vendor/ changes
e8d67d3ea0 bump(*): Kubernetes 1.6.1
b282780a04 UPSTREAM: 43375: Set permission for volume subPaths

@sjenning
Copy link
Contributor

sjenning commented Jul 5, 2017

What is weird is that I can't reproduce on kube 1.6.0, 1.6.1, 1.6.6, or 1.7

@sjenning
Copy link
Contributor

sjenning commented Jul 6, 2017

I found this BZ which was suppose to fix this.

https://bugzilla.redhat.com/show_bug.cgi?id=1444096#c4

Even passed QE.

https://bugzilla.redhat.com/show_bug.cgi?id=1444096#c18

Supposed to be fixed by #14144

Either 1) it didn't 2) there has been a regression since 3) this is a slightly different but related issue

@eparis @wongma7

@sjenning
Copy link
Contributor

sjenning commented Jul 6, 2017

@ origin commit 22888b2

busybox pod

apiVersion: v1
kind: Pod
metadata:
  name: busybox
spec:
  containers:
  - name: busybox
    image: busybox
    command:
    - sleep
    - "10"
  terminationGracePeriodSeconds: 0
  restartPolicy: Never

Additional information from the node log:

I0705 21:11:16.624834    6139 reconciler.go:242] VerifyControllerAttachedVolume operation started for volume "kubernetes.io/secret/648bdd9a-61f0-11e7-8394-7085c20cf2ab-default-token-1qdbk" (spec.Name: "default-token-1qdbk") pod "648bdd9a-61f0-11e7-8394-7085c20cf2ab" (UID: "648bdd9a-61f0-11e7-8394-7085c20cf2ab")
I0705 21:11:16.777596    6139 operation_generator.go:609] MountVolume.SetUp succeeded for volume "kubernetes.io/secret/648bdd9a-61f0-11e7-8394-7085c20cf2ab-default-token-1qdbk" (spec.Name: "default-token-1qdbk") pod "648bdd9a-61f0-11e7-8394-7085c20cf2ab" (UID: "648bdd9a-61f0-11e7-8394-7085c20cf2ab").
I0705 21:11:16.825685    6139 kuberuntime_manager.go:458] Container {Name:busybox Image:busybox Command:[sleep 10] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[]} VolumeMounts:[{Name:default-token-1qdbk ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath:}] LivenessProbe:nil ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:Always SecurityContext:&SecurityContext{Capabilities:&Capabilities{Add:[],Drop:[MKNOD SYS_CHROOT],},Privileged:*false,SELinuxOptions:&SELinuxOptions{User:,Role:,Type:,Level:s0:c1,c0,},RunAsUser:nil,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,} Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
I0705 21:11:21.143315    6139 operation_generator.go:609] MountVolume.SetUp succeeded for volume "kubernetes.io/secret/648bdd9a-61f0-11e7-8394-7085c20cf2ab-default-token-1qdbk" (spec.Name: "default-token-1qdbk") pod "648bdd9a-61f0-11e7-8394-7085c20cf2ab" (UID: "648bdd9a-61f0-11e7-8394-7085c20cf2ab").
I0705 21:11:22.148872    6139 operation_generator.go:609] MountVolume.SetUp succeeded for volume "kubernetes.io/secret/648bdd9a-61f0-11e7-8394-7085c20cf2ab-default-token-1qdbk" (spec.Name: "default-token-1qdbk") pod "648bdd9a-61f0-11e7-8394-7085c20cf2ab" (UID: "648bdd9a-61f0-11e7-8394-7085c20cf2ab").

10s later when pod terminates

I0705 21:11:31.171556    6139 operation_generator.go:609] MountVolume.SetUp succeeded for volume "kubernetes.io/secret/648bdd9a-61f0-11e7-8394-7085c20cf2ab-default-token-1qdbk" (spec.Name: "default-token-1qdbk") pod "648bdd9a-61f0-11e7-8394-7085c20cf2ab" (UID: "648bdd9a-61f0-11e7-8394-7085c20cf2ab").
I0705 21:11:31.266705    6139 reconciler.go:201] UnmountVolume operation started for volume "kubernetes.io/secret/648bdd9a-61f0-11e7-8394-7085c20cf2ab-default-token-1qdbk" (spec.Name: "default-token-1qdbk") from pod "648bdd9a-61f0-11e7-8394-7085c20cf2ab" (UID: "648bdd9a-61f0-11e7-8394-7085c20cf2ab").
I0705 21:11:31.305820    6139 operation_generator.go:684] UnmountVolume.TearDown succeeded for volume "kubernetes.io/secret/648bdd9a-61f0-11e7-8394-7085c20cf2ab-default-token-1qdbk" (OuterVolumeSpecName: "default-token-1qdbk") pod "648bdd9a-61f0-11e7-8394-7085c20cf2ab" (UID: "648bdd9a-61f0-11e7-8394-7085c20cf2ab"). InnerVolumeSpecName "default-token-1qdbk". PluginName "kubernetes.io/secret", VolumeGidValue ""
I0705 21:11:31.367046    6139 reconciler.go:363] Detached volume "kubernetes.io/secret/648bdd9a-61f0-11e7-8394-7085c20cf2ab-default-token-1qdbk" (spec.Name: "default-token-1qdbk") devicePath: ""
W0705 21:11:32.168613    6139 pod_container_deletor.go:77] Container "7b001d050e618d2d2f186b327e143e1bba7baf6d1f9e6131cd9ff53ff433c73f" not found in pod's containers

2m later

E0705 21:13:31.144859    6139 kubelet.go:1554] Unable to mount volumes for pod "busybox_default(648bdd9a-61f0-11e7-8394-7085c20cf2ab)": timeout expired waiting for volumes to attach/mount for pod "default"/"busybox". list of unattached/unmounted volumes=[default-token-1qdbk]; skipping pod
E0705 21:13:31.145105    6139 pod_workers.go:182] Error syncing pod 648bdd9a-61f0-11e7-8394-7085c20cf2ab ("busybox_default(648bdd9a-61f0-11e7-8394-7085c20cf2ab)"), skipping: timeout expired waiting for volumes to attach/mount for pod "default"/"busybox". list of unattached/unmounted volumes=[default-token-1qdbk]

The following bad events occur:

2017-07-05 21:13:31 -0500 CDT   2017-07-05 21:13:31 -0500 CDT   1         busybox   Pod                 Warning   FailedMount   kubelet, cerebellum.local.variantweb.net   Unable to mount volumes for pod "busybox_default(648bdd9a-61f0-11e7-8394-7085c20cf2ab)": timeout expired waiting for volumes to attach/mount for pod "default"/"busybox". list of unattached/unmounted volumes=[default-token-1qdbk]
2017-07-05 21:13:31 -0500 CDT   2017-07-05 21:13:31 -0500 CDT   1         busybox   Pod                 Warning   FailedSync   kubelet, cerebellum.local.variantweb.net   Error syncing pod

@wongma7
Copy link

wongma7 commented Jul 6, 2017

this is slightly different. My patch was for after a kubelet restart, the events would get spammed because kubelet "forgot" what name to use for a volume. Since there's only one/two events here it's something else

@sjenning
Copy link
Contributor

sjenning commented Jul 6, 2017

@wongma7 thanks for clarifying

@sjenning
Copy link
Contributor

sjenning commented Jul 10, 2017

I captured the stray pod worker goroutine that times out after podAttachAndMountTimeout and produces the undesired events. This is a trace from after the test pod terminates but before the timeout expires.

goroutine 44251 [chan receive]:
github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.WaitFor(0xc4261b0300, 0xc43059f640, 0xc420060120, 0x4c47c40, 0xc4261b0300)
	/home/sjennings/projects/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:270 +0x5d
github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.pollInternal(0xc4261b0300, 0xc43059f640, 0xc4261b0300, 0xc43059f640)
	/home/sjennings/projects/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:187 +0x41
github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.Poll(0x11e1a300, 0x1bf08eb000, 0xc43059f640, 0xc42ef59220, 0x1)
	/home/sjennings/projects/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:183 +0x4d
github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/volumemanager.(*volumeManager).WaitForAttachAndMount(0xc433da9420, 0xc42598b200, 0x0, 0x0)
	/home/sjennings/projects/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/volumemanager/volume_manager.go:357 +0x20f
github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet.(*Kubelet).syncPod(0xc43156f800, 0x0, 0xc42598b200, 0x0, 0xc428032ee0, 0x0, 0xc42616bf20, 0xc4306fb4a0)
	/home/sjennings/projects/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/kubelet.go:1552 +0xd2f
github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet.(*Kubelet).(github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet.syncPod)-fm(0x0, 0xc42598b200, 0x0, 0xc428032ee0, 0x0, 0xefbf440, 0xc428032ee0)
	/home/sjennings/projects/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/kubelet.go:760 +0x51
github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet.(*podWorkers).managePodLoop.func1(0xc427953f08, 0xc433da9490, 0xc427953ef0, 0x1, 0x0)
	/home/sjennings/projects/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/pod_workers.go:173 +0x146
github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet.(*podWorkers).managePodLoop(0xc433da9490, 0xc4325ca6c0)
	/home/sjennings/projects/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/pod_workers.go:176 +0x17d
github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet.(*podWorkers).UpdatePod.func1(0xc433da9490, 0xc4325ca6c0)
	/home/sjennings/projects/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/pod_workers.go:220 +0x6c
created by github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet.(*podWorkers).UpdatePod
	/home/sjennings/projects/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/pod_workers.go:221 +0x350

@sjenning
Copy link
Contributor

sjenning commented Jul 10, 2017

Added some logging

E0710 11:14:48.294323   10862 kubelet_pods.go:728] SETH podIsTerminated status.Phase: Running
E0710 11:14:48.711388   10862 kubelet.go:1903] SETH dispatchWork syncType: sync               <-- pod terminated 
E0710 11:14:48.711417   10862 kubelet_pods.go:728] SETH podIsTerminated status.Phase: Running <-- check in dispatchWork()
E0710 11:14:48.711427   10862 kubelet.go:1914] SETH kl.podWorkers.UpdatePod
E0710 11:14:48.711547   10862 kubelet_pods.go:728] SETH podIsTerminated status.Phase: Running <-- check in syncPod()
E0710 11:14:48.711639   10862 kubelet_pods.go:728] SETH podIsTerminated status.Phase: Succeeded
E0710 11:14:48.711761   10862 kubelet.go:1552] SETH syncPod @ kl.volumeManager.WaitForAttachAndMount() <-- waiting in syncPod()
E0710 11:14:50.292323   10862 kubelet_pods.go:728] SETH podIsTerminated status.Phase: Succeeded

The pod shows as Running in the podIsTerminated() calls from dispatchWork() and syncPod().

@sjenning
Copy link
Contributor

sjenning commented Jul 10, 2017

mistake in my last comment. findAndAddActivePods() calls volumehelper.IsPodTerminated() not kubelet.podIsTerminated(). They do the same thing. Amended previous comment to avoid confusion.

@sjenning
Copy link
Contributor

Found the difference between kube and origin:

I0710 16:29:10.136351   30372 kubelet.go:1848] SyncLoop (PLEG): "busybox_default(c71acf05-65b6-11e7-84de-7085c20cf2ab)", event: &pleg.PodLifecycleEvent{ID:"c71acf05-65b6-11e7-84de-7085c20cf2ab", Type:"ContainerDied", Data:"c6c9c5db3709186875c813deb09aa03fc02a78f89d794a6f6b3a3c0e17b3102c"}
I0710 16:29:10.136503   30372 kubelet_pods.go:1169] Generating status for "busybox_default(c71acf05-65b6-11e7-84de-7085c20cf2ab)"
I0710 16:29:10.136805   30372 volume_manager.go:345] Waiting for volumes to attach and mount for pod "busybox_default(c71acf05-65b6-11e7-84de-7085c20cf2ab)"
I0710 16:29:10.140055   30372 wrap.go:42] GET /api/v1/namespaces/default/pods/busybox: (2.812685ms) 200 [[openshift/v1.6.1+5115d708d7 (linux/amd64) kubernetes/fff65cf] 10.42.10.23:59568]
I0710 16:29:10.145230   30372 wrap.go:42] PUT /api/v1/namespaces/default/pods/busybox/status: (3.972476ms) 200 [[openshift/v1.6.1+5115d708d7 (linux/amd64) kubernetes/fff65cf] 10.42.10.23:59568]
I0710 16:29:10.145704   30372 status_manager.go:444] Status for pod "busybox_default(c71acf05-65b6-11e7-84de-7085c20cf2ab)" updated successfully: {status:{Phase:Succeeded Conditions:[{Type:Initialized Status:True LastProbeTime:{Time:{sec:0 nsec:0 loc:<nil>}} LastTransitionTime:{Time:{sec:63635318935 nsec:0 loc:0xefc0440}} Reason:PodCompleted Message:} {Type:Ready Status:False LastProbeTime:{Time:{sec:0 nsec:0 loc:<nil>}} LastTransitionTime:{Time:{sec:63635318950 nsec:0 loc:0xefc0440}} Reason:PodCompleted Message:} {Type:PodScheduled Status:True LastProbeTime:{Time:{sec:0 nsec:0 loc:<nil>}} LastTransitionTime:{Time:{sec:63635318935 nsec:0 loc:0xefc0440}} Reason: Message:}] Message: Reason: HostIP:10.42.10.23 PodIP:172.17.0.2 StartTime:0xc434215340 InitContainerStatuses:[] ContainerStatuses:[{Name:busybox State:{Waiting:<nil> Running:<nil> Terminated:0xc42bf72ee0} LastTerminationState:{Waiting:<nil> Running:<nil> Terminated:<nil>} Ready:false RestartCount:0 Image:docker.io/busybox:latest ImageID:docker-pullable://docker.io/busybox@sha256:be3c11fdba7cfe299214e46edc642e09514dbb9bbefcd0d3836c05a1e0cd0642 ContainerID:docker://c6c9c5db3709186875c813deb09aa03fc02a78f89d794a6f6b3a3c0e17b3102c}] QOSClass:BestEffort} version:3 podName:busybox podNamespace:default}
I0710 16:29:10.150088   30372 secret.go:186] Setting up volume default-token-jw0zp for pod c71acf05-65b6-11e7-84de-7085c20cf2ab at /home/sjennings/projects/go/src/github.com/openshift/origin/openshift.local.volumes/pods/c71acf05-65b6-11e7-84de-7085c20cf2ab/volumes/kubernetes.io~secret/default-token-jw0zp
I0710 16:29:10.153911   30372 wrap.go:42] GET /api/v1/namespaces/default/secrets/default-token-jw0zp: (3.093498ms) 200 [[openshift/v1.6.1+5115d708d7 (linux/amd64) kubernetes/fff65cf] 10.42.10.23:59568]
I0710 16:29:10.155250   30372 secret.go:217] Received secret default/default-token-jw0zp containing (4) pieces of data, 4109 total bytes
I0710 16:29:10.155732   30372 operation_generator.go:609] MountVolume.SetUp succeeded for volume "kubernetes.io/secret/c71acf05-65b6-11e7-84de-7085c20cf2ab-default-token-jw0zp" (spec.Name: "default-token-jw0zp") pod "c71acf05-65b6-11e7-84de-7085c20cf2ab" (UID: "c71acf05-65b6-11e7-84de-7085c20cf2ab").
I0710 16:29:10.250346   30372 reconciler.go:201] UnmountVolume operation started for volume "kubernetes.io/secret/c71acf05-65b6-11e7-84de-7085c20cf2ab-default-token-jw0zp" (spec.Name: "default-token-jw0zp") from pod "c71acf05-65b6-11e7-84de-7085c20cf2ab" (UID: "c71acf05-65b6-11e7-84de-7085c20cf2ab").
I0710 16:29:10.250512   30372 util.go:367] Tearing down volume default-token-jw0zp for pod c71acf05-65b6-11e7-84de-7085c20cf2ab at /home/sjennings/projects/go/src/github.com/openshift/origin/openshift.local.volumes/pods/c71acf05-65b6-11e7-84de-7085c20cf2ab/volumes/kubernetes.io~secret/default-token-jw0zp
I0710 16:29:10.302562   30372 operation_generator.go:684] UnmountVolume.TearDown succeeded for volume "kubernetes.io/secret/c71acf05-65b6-11e7-84de-7085c20cf2ab-default-token-jw0zp" (OuterVolumeSpecName: "default-token-jw0zp") pod "c71acf05-65b6-11e7-84de-7085c20cf2ab" (UID: "c71acf05-65b6-11e7-84de-7085c20cf2ab"). InnerVolumeSpecName "default-token-jw0zp". PluginName "kubernetes.io/secret", VolumeGidValue ""
I0710 16:29:10.350655   30372 reconciler.go:363] Detached volume "kubernetes.io/secret/c71acf05-65b6-11e7-84de-7085c20cf2ab-default-token-jw0zp" (spec.Name: "default-token-jw0zp") devicePath: ""

Origin unmounts the volumes (last 4 lines) for the terminated pod while kube does not, despite pkg/kubelet/volumemanager being identical between the two.

Still don't have an answer for why one unmounts and the other doesn't. I think the expected behavior is unmounting after kubernetes/kubernetes#37228

@sjenning
Copy link
Contributor

sjenning commented Jul 26, 2017

This is coarse and very likely has side effects, but does prevent the events from occuring

diff --git a/vendor/k8s.io/kubernetes/pkg/kubelet/kubelet.go b/vendor/k8s.io/kubernetes/pkg/kubelet/kubelet.go
index e1b7512ee0..b83603821f 100644
--- a/vendor/k8s.io/kubernetes/pkg/kubelet/kubelet.go
+++ b/vendor/k8s.io/kubernetes/pkg/kubelet/kubelet.go
@@ -1473,6 +1473,11 @@ func (kl *Kubelet) syncPod(o syncPodOptions) error {
        // Update status in the status manager
        kl.statusManager.SetPodStatus(pod, apiPodStatus)
 
+       // Exit syncPod if pod is terminated
+       if kl.podIsTerminated(pod) {
+               return nil
+       }
+
        // Kill pod if it should not be running
        if !runnable.Admit || pod.DeletionTimestamp != nil || apiPodStatus.Phase == v1.PodFailed {
                var syncErr error

It does seem like much of the rest of syncPod() past this point should not be done for terminated pods.

@sjenning
Copy link
Contributor

sjenning commented Jul 26, 2017

So everything about upstream kube not being effected by this was wrong. I was just running hack/local-up-cluster.sh that defaults to --keep-terminated-pod-volume=true, which was masking the issue.

Upstream issue opened.

k8s-github-robot pushed a commit to kubernetes/kubernetes that referenced this issue Aug 1, 2017
Automatic merge from submit-queue (batch tested with PRs 49284, 49555, 47639, 49526, 49724)

skip WaitForAttachAndMount for terminated pods in syncPod

Fixes #49663

I tried to tread lightly with a small localized change because this needs to be picked to 1.7 and 1.6 as well.

I suspect this has been as issue since we started unmounting volumes on pod termination #37228

xref openshift/origin#14383

@derekwaynecarr @eparis @smarterclayton @saad-ali @jwforres 

/release-note-none
@eparis
Copy link
Member

eparis commented Aug 1, 2017

it's spam, dropping to p2. (seth is going to open a bz to track in 3.6)

openshift-merge-robot added a commit that referenced this issue Aug 2, 2017
Automatic merge from submit-queue

UPSTREAM: 49724: skip WaitForAttachAndMount for terminated pods in syncPod

Fixes #14383

xref kubernetes/kubernetes#49724

@derekwaynecarr @eparis @smarterclayton @jwforres
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/kubernetes kind/bug Categorizes issue or PR as related to a bug. priority/P2
Projects
None yet
Development

No branches or pull requests

8 participants