Node Failure Handling with Longhorn
When a Kubernetes node fails with CSI driver installed (all the following are based on Kubernetes v1.12 with default setup):
kubectl get nodes will report NotReady for the failure node.NotReady node will change to either Unknown or NodeLost.Terminating state and the attached Longhorn volumes
cannot be released/reused, the new pod will get stuck in ContainerCreating state. That’s why users need to decide is if it’s safe to force deleting the pod.Terminating state. See pod eviction timeout for details.
Then if the failed node is recovered later, Kubernetes will restart those terminating pods, detach the volumes, wait for the old VolumeAttachment cleanup, and reuse(re-attach & re-mount) the volumes. Typically these steps may take 1 ~ 7 minutes.
In this case, detaching and re-attaching operations are already included in the Kubernetes recovery procedures. Hence no extra operation is needed and the Longhorn volumes will be available after the above steps.© 2019-2026 Longhorn a Series of LF Projects, LLC. Documentation Distributed under CC-BY-4.0.
For website terms of use, trademark policy and other project policies please see lfprojects.org/policies.