Troubleshooting: Instance manager pods are restarted every hour

Phan Le | February 25, 2022

Applicable versions

v1.0.1 or newer

Background

Each Longhorn volume has one engine and one or more replicas (see more detail about Longhorn architecture at here). When a Longhorn volume is attached, Longhorn launches a process for each engine/replica object. The engine process will be launched inside engine instance manager pods (the instance-manager-e-xxxxxxxx pods inside longhorn-system namespace). The replica process will be launched inside replica instance manager pods (the instance-manager-r-xxxxxxxx pods inside longhorn-system namespace).

Symptoms

The instance manager pods are restarted every hour. As the consequence, Longhorn volumes and the workload pods are crashed every hour.

Reason

One potential root cause is that the cluster has the default PriorityClass (i.e., the PriorityClass with globalDefault field set to true) but the PriorityClass setting in Longhorn is empty. See more about PriorityClass at here.

When Longhorn creates the instance manager pods, it doesn’t set the PriorityClass for them because the PriorityClass setting in Longhorn is empty. Because the cluster has default PriorityClass, Kubernetes automatically uses it for newly created Pods without a PriorityClassName. Later on, Longhorn detects the difference between the actual PriorityClass in the instance manager pods and the PriorityClass in Longhorn setting, so Longhorn deletes and recreates the instance manager pods. This happens every hour since Longhorn resyncs all setting every hour.

Solution

Set the PriorityClass setting in Longhorn to be the same as the default PriorityClass

Back to Knowledge Base

© 2019-2022 Longhorn Authors | Documentation Distributed under CC-BY-4.0


© 2022 The Linux Foundation. All rights reserved. The Linux Foundation has registered trademarks and uses trademarks. For a list of trademarks of The Linux Foundation, please see our Trademark Usage page.