Troubleshooting: Volumes Stuck in Attach/Detach Loop When Using Longhorn on OKD

| February 9, 2023

Applicable versions

All Longhorn versions.

Symptoms

All volumes stuck in Attach/Detach loop. By using dmesg on storage nodes you can see errors like the following:

[Sat Dec 10 18:52:01 2022] audit: type=1400 audit(1670698321.515:7214): avc:  denied  { dac_override } for  pid=231579 comm="iscsiadm" capability=1  scontext=system_u:system_r:iscsid_t:s0 tcontext=system_u:system_r:iscsid_t:s0 tclass=capability permissive=0
[Sat Dec 10 18:52:01 2022] audit: type=1300 audit(1670698321.515:7214): arch=c000003e syscall=83 success=no exit=-13 a0=55b9035185c0 a1=1f8 a2=ffffffffffffff00 a3=0 items=0 ppid=231163 pid=231579 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="iscsiadm" exe="/usr/sbin/iscsiadm" subj=system_u:system_r:iscsid_t:s0 key=(null)
[Sat Dec 10 18:52:01 2022] audit: type=1327 audit(1670698321.515:7214): proctitle=697363736961646D002D6D00646973636F76657279002D740073656E6474617267657473002D700031302E3133312E312E31363

Reason

Caused by the permission issue related to the host SELinux policies which prevent iscsiadm from operating correctly. This issue is likely to happen if the open-iscsi version is before or equal to 2.1.4 and in some OKD versions

Solution

There are three ways to resolve the issue.

  1. Upgrade your OKD to a newer version which is after 4.12.0-0.okd-2022-12-10.
  2. Upgrade open-iscsi to a newer version including the fix of the permission issue if possible.
  3. If in the existing non-working environment, applying dac_override using a local CIL via MachineConfig is also a workaround. Please take a look at the below reference links.
Back to Knowledge Base

Recent articles

Troubleshooting: NoExecute taint prevents workloads from terminating
Troubleshooting: Orphan ISCSI Session Error
Failure to Attach Volumes After Upgrade to Longhorn v1.5.x
Kubernetes resource revision frequency expectations
SELinux and Longhorn
Troubleshooting: RWX Volume Fails to Be Attached Caused by `Protocol not supported`
Troubleshooting: fstrim doesn't work on old kernel
Troubleshooting: Failed RWX mount due to connection timeout
Space consumption guideline
Troubleshooting: Unexpected expansion leads to degradation or attach failure
Troubleshooting: Failure to delete orphaned Pod volume directory
Troubleshooting: Volume attachment fails due to SELinux denials in Fedora downstream distributions
Troubleshooting: Volumes Stuck in Attach/Detach Loop When Using Longhorn on OKD
Troubleshooting: Velero restores Longhorn PersistentVolumeClaim stuck in the Pending state when using the Velero CSI Plugin version before v0.4.0
Analysis: Potential Data/Filesystem Corruption
Instruction: How To Migrate Longhorn Chart Installed In Old Rancher UI To The Chart In New Rancher UI
Troubleshooting: Unable to access an NFS backup target
Troubleshooting: Pod with `volumeMode: Block` is stuck in terminating
Troubleshooting: Instance manager pods are restarted every hour
Troubleshooting: Open-iSCSI on RHEL based systems
Troubleshooting: Upgrading volume engine is stuck in deadlock
Tip: Set Longhorn To Only Use Storage On A Specific Set Of Nodes
Troubleshooting: Some old instance manager pods are still running after upgrade
Troubleshooting: Volume cannot be cleaned up after the node of the workload pod is down and recovered
Troubleshooting: DNS Resolution Failed
Troubleshooting: Generate pprof runtime profiling data
Troubleshooting: Pod stuck in creating state when Longhorn volumes filesystem is corrupted
Troubleshooting: None-standard Kubelet directory
Troubleshooting: Longhorn default settings do not persist
Troubleshooting: Recurring job does not create new jobs after detaching and attaching volume
Troubleshooting: Use Traefik 2.x as ingress controller
Troubleshooting: Create Support Bundle with cURL
Troubleshooting: Longhorn RWX shared mount ownership is shown as nobody in consumer Pod
Troubleshooting: `MountVolume.SetUp failed for volume` due to multipathd on the node
Troubleshooting: Longhorn-UI: Error during WebSocket handshake: Unexpected response code: 200 #2265
Troubleshooting: Longhorn volumes take a long time to finish mounting
Troubleshooting: `volume readonly or I/O error`
Troubleshooting: `volume pvc-xxx not scheduled`

© 2019-2024 Longhorn Authors | Documentation Distributed under CC-BY-4.0


© 2024 The Linux Foundation. All rights reserved. The Linux Foundation has registered trademarks and uses trademarks. For a list of trademarks of The Linux Foundation, please see our Trademark Usage page.