Troubleshooting: fstrim doesn't work on old kernel
| October 6, 2023
v1.4.0
v4.12
When running filesystem trim (either by Longhorn UI or manually by fstrim
command on the host), it hits the error similar to:
unable to trim filesystem for volume pvc-e381424a-4866-447a-a75c-3096036f7846: cannot find volume pvc-e381424a-4866-447a-a75c-3096036f7846 mount info on host: failed to execute: nsenter [--mount=/host/proc/20357/ns/mnt --net=/host/proc/20357/ns/net fstrim /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/d2df1133f3440486ddec39370380eeed7a3c71499981d63fc80e43c7ca9f4c9e/globalmount], output , stderr fstrim: /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/d2df1133f3440486ddec39370380eeed7a3c71499981d63fc80e43c7ca9f4c9e/globalmount: FITRIM ioctl failed: Input/output error\n: exit status 255
or
unable to trim filesystem for volume pvc-e381424a-4866-447a-a75c-3096036f7846: cannot find volume pvc-e381424a-4866-447a-a75c-3096036f7846 mount info on host: failed to execute: nsenter [--mount=/host/proc/20357/ns/mnt --net=/host/proc/20357/ns/net fstrim /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/d2df1133f3440486ddec39370380eeed7a3c71499981d63fc80e43c7ca9f4c9e/globalmount], output , stderr fstrim: /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/d2df1133f3440486ddec39370380eeed7a3c71499981d63fc80e43c7ca9f4c9e/globalmount: the discard operation is not supported : exit status 1
This is caused by a problem in old kernel SCSI driver which sets the wrong provisioning mode for the SCSI device that advertised both UNMAP and WRITE SAME capability as detailed in this kernel patch https://github.com/torvalds/linux/commit/bcd069bb250acf6088b60d189ab3ec3ae8dd11a5
v1.4.0
in the volume detail page in Longhorn UI. Only engine
image >= v1.4.0
supports filesystem trim feature.v1.4.0
to >= v1.4.0
, make sure that the
volume is detached/reattached first to activate the trim feature in Longhorn volume. You can do this by manually scale
down and up the workload deployment that is using the volume. See more details at https://longhorn.io/docs/1.4.0/volumes-and-nodes/trim-filesystem/#prerequisitesls -l /dev/longhorn/<volume-name>
. For example:[root@phan-v147-cloudera-pool1-f1dec634-cm59v ~]# ls -l /dev/longhorn/testvol
brw-rw----. 1 root root 8, 0 Oct 6 20:24 /dev/longhorn/testvol
In this case the major:minor versions is 8:0
lsscsi -d
. For example,[root@phan-v147-cloudera-pool1-f1dec634-cm59v ~]# lsscsi -d
[3:0:0:0] storage IET Controller 0001 -
[3:0:0:1] disk IET VIRTUAL-DISK 0001 /dev/sda [8:0]
In this case, the corresponding corresponding SCSI device’s address is [3:0:0:1]
because it also has major:minor versions as 8:0
find /sys/ -name provisioning_mode -exec grep -H . {} + | sort
. For example:[root@phan-v147-cloudera-pool1-f1dec634-cm59v ~]# find /sys/ -name provisioning_mode -exec grep -H . {} + | sort
/sys/devices/platform/host3/session1/target3:0:0/3:0:0:1/scsi_disk/3:0:0:1/provisioning_mode:writesame_16
In this case, looks at the device with the address 3:0:0:1
, it has provisioning mode as writesame_16
(in some cases,
it can be disable
) but not the correct value unmap
uname -r
. If the kernel version is < v4.12
, you have hit this issueUpgrade the kernel to a recommended version as in the best practice page https://longhorn.io/docs/1.5.1/best-practices/#operating-system.
At the time when writing this article, the recommended kernel version is >= v5.8
Recent articles
Troubleshooting: NoExecute taint prevents workloads from terminating© 2019-2024 Longhorn Authors | Documentation Distributed under CC-BY-4.0
© 2024 The Linux Foundation. All rights reserved. The Linux Foundation has registered trademarks and uses trademarks. For a list of trademarks of The Linux Foundation, please see our Trademark Usage page.