Troubleshooting: NoExecute taint prevents workloads from terminating

| July 2, 2024

Applicable versions

All Longhorn versions.

Symptoms

Applying a NoExecute taint to a node causes all pods on that node that cannot tolerate the taint to terminate. Users may expect pods using Longhorn volumes and managed by a controller (e.g. Deployment pods) to be able to restart on different nodes after the taint is applied. However, the replacement pods remain ContainerCreating and the old pods remain Terminating indefinitely.

For example, after applying a taint to the node running this Deployment pod, the cluster remains in the following state.

eweber@laptop:~/> kubectl taint node eweber-v126-worker-9c1451b4-kgxdq k=v:NoExecute
node/eweber-v126-worker-9c1451b4-kgxdq tainted

eweber@laptop:~/website> kubectl get pod -owide
mysql-56c8c6775b-8f5gh                 0/1     Terminating         0                30m    10.42.2.58        eweber-v126-worker-9c1451b4-kgxdq   <none>           <none>
mysql-56c8c6775b-rph8k                 0/1     ContainerCreating   0                28m    <none>            eweber-v126-worker-9c1451b4-rw5hf   <none>           <none>

Describing the replacement pod reveals the immediate cause for the lack of progress.

...
eweber@laptop:~/> kubectl describe pod mysql-56c8c6775b-rph8k
Warning  FailedAttachVolume  27m   attachdetach-controller  Multi-Attach error for volume "pvc-b23fce3b-cada-43a9-89b8-2eb9b97e39c9" Volume is already exclusively attached to one node and can't be attached to another

Root cause

The above Multi-Attach error is generated by Kubernetes, not Longhorn. Kubernetes will not create a VolumeAttachment requesting Longhorn attach the volume to a new node until it is detached from the old one. Currently, there is only one VolumeAttachment for the volume in the cluster, and it still references the old node.

eweber@laptop:~/> kubectl get volumeattachment
csi-f1a6eae2f691ad48e39aca528454c64f7c46a3e6c446e6f9a8ecd960895bf0b6   driver.longhorn.io   pvc-b23fce3b-cada-43a9-89b8-2eb9b97e39c9   eweber-v126-worker-9c1451b4-kgxdq   true       31m

In fact, Kubernetes will not attempt to delete this old VolumeAttachment until kubelet on the old node reports that all pods using the volume have been successfully torn down. As long as the old pod is stuck Terminating, no progress can be made.

The kubelet logs on the old node reveal the reason for the long termination.

Jun 28 20:40:37 eweber-v126-worker-9c1451b4-kgxdq k3s[753]: I0628 20:40:37.391775     753 reconciler_common.go:172] "operationExecutor.UnmountVolume started for volume \"mysql-volume\" (UniqueName: \"kubernetes.io/csi/driver.longhorn.io^pvc-b23fce3b-cada-43a9-89b8-2eb9b97e39c9\") pod \"942ef69b-2e06-47f3-8bac-cc738c8baa00\" (UID: \"942ef69b-2e06-47f3-8bac-cc738c8baa00\") "
Jun 28 20:40:37 eweber-v126-worker-9c1451b4-kgxdq k3s[753]: E0628 20:40:37.391968     753 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/csi/driver.longhorn.io^pvc-b23fce3b-cada-43a9-89b8-2eb9b97e39c9 podName:942ef69b-2e06-47f3-8bac-cc738c8baa00 nodeName:}" failed. No retries permitted until 2024-06-28 20:42:39.391919153 +0000 UTC m=+10744.364754873 (durationBeforeRetry 2m2s). Error: UnmountVolume.TearDown failed for volume "mysql-volume" (UniqueName: "kubernetes.io/csi/driver.longhorn.io^pvc-b23fce3b-cada-43a9-89b8-2eb9b97e39c9") pod "942ef69b-2e06-47f3-8bac-cc738c8baa00" (UID: "942ef69b-2e06-47f3-8bac-cc738c8baa00") : kubernetes.io/csi: Unmounter.TearDownAt failed to get CSI client: driver name driver.longhorn.io not found in the list of registered CSI drivers

The pod can’t be torn down because the Longhorn CSI plugin is no longer registered with kubelet. It is the plugin’s responsibility to ensure that Longhorn volumes are unmounted safely. But in this cluster, Longhorn has not been configured to tolerate the applied NoExecute taint. This has caused Longhorn pods (including the Longhorn CSI plugin pod) to terminate. No progress can be made without a running Longhorn CSI plugin on the node.

eweber@laptop:~/> kubectl -n longhorn-system get pod -owide | grep eweber-v126-worker-9c1451b4-kgxdq
# Empty...

Best practices

If you plan to apply NoExecute taints to nodes that run Longhorn volumes, configure Longhorn to tolerate them. This is best done at install time, as tolerations cannot be updated while volumes are attached.

If you need to apply a NoExecute taint and cannot change the taint toleration setting before, drain the node first and then apply the intended taint. This gives all workloads consuming Longhorn volumes the opportunity to migrate to other nodes. When the NoExecute taint is applied, there is no longer a need for Longhorn components to run on the tainted node.

Workaround

If you already applied a NoExecute taint that Longhorn can’t tolerate and are stuck in the situation described, there are two options.

  1. Remove the NoExecute taint and wait for the situation to resolve itself. Once the Longhorn CSI plugin restarts on the old node, pod termination will complete successfully. Then, follow the best practices above.
  2. Detach all volumes and configure Longhorn to tolerate the applied taint. This is likely not the preferred solution, but it may be appropriate in some circumstances.
Back to Knowledge Base

Recent articles

Troubleshooting: NoExecute taint prevents workloads from terminating
Troubleshooting: Orphan ISCSI Session Error
Failure to Attach Volumes After Upgrade to Longhorn v1.5.x
Kubernetes resource revision frequency expectations
SELinux and Longhorn
Troubleshooting: RWX Volume Fails to Be Attached Caused by `Protocol not supported`
Troubleshooting: fstrim doesn't work on old kernel
Troubleshooting: Failed RWX mount due to connection timeout
Space consumption guideline
Troubleshooting: Unexpected expansion leads to degradation or attach failure
Troubleshooting: Failure to delete orphaned Pod volume directory
Troubleshooting: Volume attachment fails due to SELinux denials in Fedora downstream distributions
Troubleshooting: Volumes Stuck in Attach/Detach Loop When Using Longhorn on OKD
Troubleshooting: Velero restores Longhorn PersistentVolumeClaim stuck in the Pending state when using the Velero CSI Plugin version before v0.4.0
Analysis: Potential Data/Filesystem Corruption
Instruction: How To Migrate Longhorn Chart Installed In Old Rancher UI To The Chart In New Rancher UI
Troubleshooting: Unable to access an NFS backup target
Troubleshooting: Pod with `volumeMode: Block` is stuck in terminating
Troubleshooting: Instance manager pods are restarted every hour
Troubleshooting: Open-iSCSI on RHEL based systems
Troubleshooting: Upgrading volume engine is stuck in deadlock
Tip: Set Longhorn To Only Use Storage On A Specific Set Of Nodes
Troubleshooting: Some old instance manager pods are still running after upgrade
Troubleshooting: Volume cannot be cleaned up after the node of the workload pod is down and recovered
Troubleshooting: DNS Resolution Failed
Troubleshooting: Generate pprof runtime profiling data
Troubleshooting: Pod stuck in creating state when Longhorn volumes filesystem is corrupted
Troubleshooting: None-standard Kubelet directory
Troubleshooting: Longhorn default settings do not persist
Troubleshooting: Recurring job does not create new jobs after detaching and attaching volume
Troubleshooting: Use Traefik 2.x as ingress controller
Troubleshooting: Create Support Bundle with cURL
Troubleshooting: Longhorn RWX shared mount ownership is shown as nobody in consumer Pod
Troubleshooting: `MountVolume.SetUp failed for volume` due to multipathd on the node
Troubleshooting: Longhorn-UI: Error during WebSocket handshake: Unexpected response code: 200 #2265
Troubleshooting: Longhorn volumes take a long time to finish mounting
Troubleshooting: `volume readonly or I/O error`
Troubleshooting: `volume pvc-xxx not scheduled`

© 2019-2024 Longhorn Authors | Documentation Distributed under CC-BY-4.0


© 2024 The Linux Foundation. All rights reserved. The Linux Foundation has registered trademarks and uses trademarks. For a list of trademarks of The Linux Foundation, please see our Trademark Usage page.