Troubleshooting: RWX Volume Fails to Be Attached Caused by `Protocol not supported`

Applicable versions

All Longhorn versions.

Symptoms

Attempts to attach an RWX volume are unsuccessful, and the workload using the volume is unable to start. The logs contain the following messages:

Oct 11 07:42:23 dev-worker-1 k3s[1294]: Mounting command: /usr/local/sbin/nsmounter
Oct 11 07:42:23 dev-worker-1 k3s[1294]: Mounting arguments: mount -t nfs -o vers=4.1,noresvport,intr,hard 10.43.207.185:/pvc-13538170-4278-4467-b2b0-1f1ba6f54a4c /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/185c34f566c2eca6e8c7c6a2ede2094c076d7d25ddae286dc633eeef80551af0/globalmount
Oct 11 07:42:23 dev-worker-1-autoscaled-small-19baf778f50efd8c k3s[1294]: Output: mount.nfs: Protocol not supported for 10.43.207.185:/pvc-13538170-4278-4467-b2b0-1f1ba6f54a4c on /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/185c34f566c2eca6e8c7c6a2ede2094c076d7d25ddae286dc633eeef80551af0/globalmount

The issue applies to RWX volumes on hosts running operating systems that use specific Linux kernel versions with known NFS-related bugs. Among the affected are OpenSUSE MicroOS (from 2023/10/08 to 2023/10/21) and other distributions using kernel version 6.5.6.

Reason

Longhorn RWX volumes depend on NFS when connecting multiple pods to a shared volume. However, commits to the Linux kernel can occasionally break NFS functionality. A regression in the NFS protocol is identified in this kernel commit. Because of the regression, NFS clients are unable to connect to an NFS server inside a share manager pod, and then the attachment operation fails.

Solution

The regression has been addressed in the kernel commit within the vanilla kernel version 6.5.7. Since the commit leading to the regression might be backported to older kernel versions in various distributions, to resolve the issue, it’s advisable to inspect the source code of your Linux kernel to determine if the error stems from the commit. For instance, Ubuntu users can check it within the code repository. If confirmed, you can then proceed with either of the following actions:

Upgrade the operating system to a version that uses a fixed kernel.
Downgrade the operating system to a version that uses a kernel released before the regression occurred.

Distro	Broken Version
Vanilla kernel	6.5.6
Ubuntu	5.15.0-94
Ubuntu	6.5.0-21
Ubuntu	6.5.0-1014-aws

https://github.com/longhorn/longhorn/issues/6857
https://github.com/longhorn/longhorn/issues/6887 https://github.com/longhorn/longhorn/issues/8344

Applicable versions

Symptoms

Reason

Solution

Related information