Troubleshooting: Longhorn RWX shared mount ownership is shown as nobody in consumer Pod

| March 31, 2021

Applicable versions

Longhorn versions = v1.1.0

Symptoms

When Pod mounts with RWX volume, the Pod share mount directory and all of the ownership of its recurring contents are shown as nobody, but in the share-manager is shown as root.

root@ip-172-30-0-139:/home/ubuntu# kubectl exec -it rwx-test-2pml2 -- ls -l /data
total 16
drwx------ 2 nobody 42949672 16384 Mar 31 04:16 lost+found

root@ip-172-30-0-139:~# kubectl -n longhorn-system exec -it share-manager-pvc-f3775852-1e27-423f-96ab-95ccd04e4777 -- ls -l /export/pvc-f3775852-1e27-423f-96ab-95ccd04e4777
total 16
drwx------ 2 root root 16384 Mar 31 04:42 lost+found

Background

The nfs-ganesha in share-manager uses idmapd for NFSv4 ID mapping and is set to use localdomain as its export Domain.

Reason

A result of content mismatch in /etc/idmapd.conf between client(host) and server(share-manager) causes ownership to change.

Let’s look at an example:

We assume you have not modified /etc/idmapd.conf on your cluster hosts. For some OS, Domain = localdomain is commented out and it uses FQDN minus hostname by default.

When the hostname is ip-172-30-0-139 and FQDN is ip-172-30-0-139.lan, the host idmapd then uses lan as the Domain.

root@ip-172-30-0-139:/home/ubuntu# hostname
ip-172-30-0-139

root@ip-172-30-0-139:/home/ubuntu# hostname -f
ip-172-30-0-139.lan

This caused the domain mismatch between share-manager(localdomain) and cluster hosts(lan). Hence triggers file permission to change to use nobody.

[Mapping] section variables

Nobody-User
Local user name to be used when a mapping cannot be completed.
Nobody-Group
Local group name to be used when a mapping cannot be completed.

Solution

  1. Uncomment or add Domain = localdomain in /etc/idmapd.conf on all cluster hosts.
root@ip-172-30-0-139:~# cat /etc/idmapd.conf 
[General]

Verbosity = 0
Pipefs-Directory = /run/rpc_pipefs
# set your own domain here, if it differs from FQDN minus hostname
Domain = localdomain

[Mapping]

Nobody-User = nobody
Nobody-Group = nogroup
  1. Delete and recreate RWX resource stack (pvc + pod).
root@ip-172-30-0-139:/home/ubuntu# kubectl exec -it volume-test -- ls -l /data
total 16
drwx------    2 root     root         16384 Mar 31 04:42 lost+found
Back to Knowledge Base

Recent articles

Kubernetes resource revision frequency expectations
SELinux and Longhorn
Troubleshooting: RWX Volume Fails to Be Attached Caused by `Protocol not supported`
Troubleshooting: fstrim doesn't work on old kernel
Troubleshooting: Failed RWX mount due to connection timeout
Space consumption guideline
Troubleshooting: Unexpected expansion leads to degradation or attach failure
Troubleshooting: Failure to delete orphaned Pod volume directory
Troubleshooting: Volume attachment fails due to SELinux denials in Fedora downstream distributions
Troubleshooting: Volumes Stuck in Attach/Detach Loop When Using Longhorn on OKD
Troubleshooting: Velero restores Longhorn PersistentVolumeClaim stuck in the Pending state when using the Velero CSI Plugin version before v0.4.0
Analysis: Potential Data/Filesystem Corruption
Instruction: How To Migrate Longhorn Chart Installed In Old Rancher UI To The Chart In New Rancher UI
Troubleshooting: Unable to access an NFS backup target
Troubleshooting: Pod with `volumeMode: Block` is stuck in terminating
Troubleshooting: Instance manager pods are restarted every hour
Troubleshooting: Open-iSCSI on RHEL based systems
Troubleshooting: Upgrading volume engine is stuck in deadlock
Tip: Set Longhorn To Only Use Storage On A Specific Set Of Nodes
Troubleshooting: Some old instance manager pods are still running after upgrade
Troubleshooting: Volume cannot be cleaned up after the node of the workload pod is down and recovered
Troubleshooting: DNS Resolution Failed
Troubleshooting: Generate pprof runtime profiling data
Troubleshooting: Pod stuck in creating state when Longhorn volumes filesystem is corrupted
Troubleshooting: None-standard Kubelet directory
Troubleshooting: Longhorn default settings do not persist
Troubleshooting: Recurring job does not create new jobs after detaching and attaching volume
Troubleshooting: Use Traefik 2.x as ingress controller
Troubleshooting: Create Support Bundle with cURL
Troubleshooting: Longhorn RWX shared mount ownership is shown as nobody in consumer Pod
Troubleshooting: `MountVolume.SetUp failed for volume` due to multipathd on the node
Troubleshooting: Longhorn-UI: Error during WebSocket handshake: Unexpected response code: 200 #2265
Troubleshooting: Longhorn volumes take a long time to finish mounting
Troubleshooting: `volume readonly or I/O error`
Troubleshooting: `volume pvc-xxx not scheduled`

© 2019-2024 Longhorn Authors | Documentation Distributed under CC-BY-4.0


© 2024 The Linux Foundation. All rights reserved. The Linux Foundation has registered trademarks and uses trademarks. For a list of trademarks of The Linux Foundation, please see our Trademark Usage page.