Troubleshooting: Instance manager pods are restarted every hour
| February 25, 2022
v1.0.1 or newer
Each Longhorn volume has one engine and one or more replicas (see more detail about Longhorn architecture at here).
When a Longhorn volume is attached, Longhorn launches a process for each engine/replica object.
The engine process will be launched inside engine instance manager pods (the instance-manager-e-xxxxxxxx
pods inside longhorn-system
namespace).
The replica process will be launched inside replica instance manager pods (the instance-manager-r-xxxxxxxx
pods inside longhorn-system
namespace).
The instance manager pods are restarted every hour. As the consequence, Longhorn volumes and the workload pods are crashed every hour.
One potential root cause is that the cluster has the default PriorityClass (i.e., the PriorityClass with globalDefault
field set to true
) but the PriorityClass setting in Longhorn is empty.
See more about PriorityClass at here.
When Longhorn creates the instance manager pods, it doesn’t set the PriorityClass for them because the PriorityClass setting in Longhorn is empty. Because the cluster has default PriorityClass, Kubernetes automatically uses it for newly created Pods without a PriorityClassName. Later on, Longhorn detects the difference between the actual PriorityClass in the instance manager pods and the PriorityClass in Longhorn setting, so Longhorn deletes and recreates the instance manager pods. This happens every hour since Longhorn resyncs all setting every hour.
Set the PriorityClass setting in Longhorn to be the same as the default PriorityClass
Recent articles
Troubleshooting: NoExecute taint prevents workloads from terminating© 2019-2024 Longhorn Authors | Documentation Distributed under CC-BY-4.0
© 2024 The Linux Foundation. All rights reserved. The Linux Foundation has registered trademarks and uses trademarks. For a list of trademarks of The Linux Foundation, please see our Trademark Usage page.