Troubleshooting: NoExecute taint prevents workloads from terminating

Applicable versions

All Longhorn versions.

Symptoms

Applying a NoExecute taint to a node causes all pods on that node that cannot tolerate the taint to terminate. Users may expect pods using Longhorn volumes and managed by a controller (e.g. Deployment pods) to be able to restart on different nodes after the taint is applied. However, the replacement pods remain ContainerCreating and the old pods remain Terminating indefinitely.

For example, after applying a taint to the node running this Deployment pod, the cluster remains in the following state.

eweber@laptop:~/> kubectl taint node eweber-v126-worker-9c1451b4-kgxdq k=v:NoExecute
node/eweber-v126-worker-9c1451b4-kgxdq tainted

eweber@laptop:~/website> kubectl get pod -owide
mysql-56c8c6775b-8f5gh                 0/1     Terminating         0                30m    10.42.2.58        eweber-v126-worker-9c1451b4-kgxdq   <none>           <none>
mysql-56c8c6775b-rph8k                 0/1     ContainerCreating   0                28m    <none>            eweber-v126-worker-9c1451b4-rw5hf   <none>           <none>

Describing the replacement pod reveals the immediate cause for the lack of progress.

...
eweber@laptop:~/> kubectl describe pod mysql-56c8c6775b-rph8k
Warning  FailedAttachVolume  27m   attachdetach-controller  Multi-Attach error for volume "pvc-b23fce3b-cada-43a9-89b8-2eb9b97e39c9" Volume is already exclusively attached to one node and can't be attached to another

Root cause

The above Multi-Attach error is generated by Kubernetes, not Longhorn. Kubernetes will not create a VolumeAttachment requesting Longhorn attach the volume to a new node until it is detached from the old one. Currently, there is only one VolumeAttachment for the volume in the cluster, and it still references the old node.

eweber@laptop:~/> kubectl get volumeattachment
csi-f1a6eae2f691ad48e39aca528454c64f7c46a3e6c446e6f9a8ecd960895bf0b6   driver.longhorn.io   pvc-b23fce3b-cada-43a9-89b8-2eb9b97e39c9   eweber-v126-worker-9c1451b4-kgxdq   true       31m

In fact, Kubernetes will not attempt to delete this old VolumeAttachment until kubelet on the old node reports that all pods using the volume have been successfully torn down. As long as the old pod is stuck Terminating, no progress can be made.

The kubelet logs on the old node reveal the reason for the long termination.

Jun 28 20:40:37 eweber-v126-worker-9c1451b4-kgxdq k3s[753]: I0628 20:40:37.391775     753 reconciler_common.go:172] "operationExecutor.UnmountVolume started for volume \"mysql-volume\" (UniqueName: \"kubernetes.io/csi/driver.longhorn.io^pvc-b23fce3b-cada-43a9-89b8-2eb9b97e39c9\") pod \"942ef69b-2e06-47f3-8bac-cc738c8baa00\" (UID: \"942ef69b-2e06-47f3-8bac-cc738c8baa00\") "
Jun 28 20:40:37 eweber-v126-worker-9c1451b4-kgxdq k3s[753]: E0628 20:40:37.391968     753 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/csi/driver.longhorn.io^pvc-b23fce3b-cada-43a9-89b8-2eb9b97e39c9 podName:942ef69b-2e06-47f3-8bac-cc738c8baa00 nodeName:}" failed. No retries permitted until 2024-06-28 20:42:39.391919153 +0000 UTC m=+10744.364754873 (durationBeforeRetry 2m2s). Error: UnmountVolume.TearDown failed for volume "mysql-volume" (UniqueName: "kubernetes.io/csi/driver.longhorn.io^pvc-b23fce3b-cada-43a9-89b8-2eb9b97e39c9") pod "942ef69b-2e06-47f3-8bac-cc738c8baa00" (UID: "942ef69b-2e06-47f3-8bac-cc738c8baa00") : kubernetes.io/csi: Unmounter.TearDownAt failed to get CSI client: driver name driver.longhorn.io not found in the list of registered CSI drivers

The pod can’t be torn down because the Longhorn CSI plugin is no longer registered with kubelet. It is the plugin’s responsibility to ensure that Longhorn volumes are unmounted safely. But in this cluster, Longhorn has not been configured to tolerate the applied NoExecute taint. This has caused Longhorn pods (including the Longhorn CSI plugin pod) to terminate. No progress can be made without a running Longhorn CSI plugin on the node.

eweber@laptop:~/> kubectl -n longhorn-system get pod -owide | grep eweber-v126-worker-9c1451b4-kgxdq
# Empty...

Best practices

If you plan to apply NoExecute taints to nodes that run Longhorn volumes, configure Longhorn to tolerate them. This is best done at install time, as tolerations cannot be updated while volumes are attached.

If you need to apply a NoExecute taint and cannot change the taint toleration setting before, drain the node first and then apply the intended taint. This gives all workloads consuming Longhorn volumes the opportunity to migrate to other nodes. When the NoExecute taint is applied, there is no longer a need for Longhorn components to run on the tainted node.

Workaround

If you already applied a NoExecute taint that Longhorn can’t tolerate and are stuck in the situation described, there are two options.

Remove the NoExecute taint and wait for the situation to resolve itself. Once the Longhorn CSI plugin restarts on the old node, pod termination will complete successfully. Then, follow the best practices above.
Detach all volumes and configure Longhorn to tolerate the applied taint. This is likely not the preferred solution, but it may be appropriate in some circumstances.

https://github.com/longhorn/longhorn/issues/2517
The original Longhorn issue describing this scenario
https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration
Kubernetes taint and toleration documentation.

Applicable versions

Symptoms

Root cause

Best practices

Workaround

Related information