Node Failure Handling with Longhorn
This section is aimed to inform users of what happens during a node failure and what is expected during the recovery.
After one minute,
kubectl get nodes will report
NotReady for the failure node.
After about five minutes, the states of all the pods on the
NotReady node will change to either
StatefulSets have a stable identity, so Kubernetes won’t force delete the pod for the user. See the official Kubernetes documentation about forcing the deletion of a StatefulSet.
Deployments don’t have a stable identity, but for the Read-Write-Once type of storage, since it cannot be attached to two nodes at the same time, the new pod created by Kubernetes won’t be able to start due to the RWO volume still attached to the old pod, on the lost node.
In both cases, Kubernetes will automatically evict the pod (set deletion timestamp for the pod) on the lost node, then try to recreate a new one with old volumes. Because the evicted pod gets stuck in
Terminating state and the attached volumes cannot be released/reused, the new pod will get stuck in
ContainerCreating state, if there is no intervene from admin or storage software.
Longhorn provides an option to help users automatically force delete terminating pods of StatefulSet/Deployment on the node that is down. After force deleting, Kubernetes will detach the Longhorn volume and spin up replacement pods on a new node.
You can find more detail about the setting options in the
Pod Deletion Policy When Node is Down in the Settings tab in the Longhorn UI or Settings reference
If the node is back online within 5 - 6 minutes of the failure, Kubernetes will restart pods, unmount, and re-mount volumes without volume re-attaching and VolumeAttachment cleanup.
Because the volume engines would be down after the node is down, this direct remount won’t work since the device no longer exists on the node.
In this case, Longhorn will detach and re-attach the volumes to recover the volume engines, so that the pods can remount/reuse the volumes safely.
If the node is not back online within 5 - 6 minutes of the failure, Kubernetes will try to delete all unreachable pods based on the pod eviction mechanism and these pods will be in a
Terminating state. See pod eviction timeout for details.
Then if the failed node is recovered later, Kubernetes will restart those terminating pods, detach the volumes, wait for the old VolumeAttachment cleanup, and reuse(re-attach & re-mount) the volumes. Typically these steps may take 1 ~ 7 minutes.
In this case, detaching and re-attaching operations are already included in the Kubernetes recovery procedures. Hence no extra operation is needed and the Longhorn volumes will be available after the above steps.
For all above recovery scenarios, Longhorn will handle those steps automatically with the association of Kubernetes.
© 2019-2023 Longhorn Authors | Documentation Distributed under CC-BY-4.0
© 2023 The Linux Foundation. All rights reserved. The Linux Foundation has registered trademarks and uses trademarks. For a list of trademarks of The Linux Foundation, please see our Trademark Usage page.