Troubleshooting: `volume readonly or I/O error`
| January 8, 2021
All Longhorn versions.
When an application writes data to existing files or creates files in the mount point of a Longhorn volume, the following message is shown:
/ # cd data
/data # echo test > test
sh: can't create test: I/O error
When running the dmesg
in the related pod or the node host, the following message is shown:
......
[1586907.286218] EXT4-fs (sdc): mounted filesystem with ordered data mode. Opts: (null)
[1587396.152106] EXT4-fs warning (device sdc): ext4_end_bio:323: I/O error 10 writing to inode 12 (offset 0 size 4096 starting block 33026)
[1587403.789877] EXT4-fs error (device sdc): ext4_find_entry:1455: inode #2: comm sh: reading directory lblock 0
[1587404.353594] EXT4-fs warning (device sdc): htree_dirblock_to_tree:994: inode #2: lblock 0: comm ls: error -5 reading directory block
[1587404.353598] EXT4-fs error (device sdc): ext4_journal_check_start:61: Detected aborted journal
[1587404.355087] EXT4-fs (sdc): Remounting filesystem read-only
......
When checking the event using kubectl -n longhorn-system get event | grep <volume name>
, an event like the following is shown:
2m26s Warning DetachedUnexpectedly volume/pvc-342edde0-d3f4-47c6-abf6-bf8eeda3c32c Engine of volume pvc-342edde0-d3f4-47c6-abf6-bf8eeda3c32c dead unexpectedly, reattach the volume
When checking for logs of the Longhorn manager pods on the node the workload is running on by running kubectl -n longhorn-system logs <longhorn manager pod name> | grep <volume name>
, the following message is shown:
time="2021-01-05T11:20:46Z" level=debug msg="Instance handler updated instance pvc-342edde0-d3f4-47c6-abf6-bf8eeda3c32c-e-0fe2dac3 state, old state running, new state error"
time="2021-01-05T11:20:46Z" level=warning msg="Instance pvc-342edde0-d3f4-47c6-abf6-bf8eeda3c32c-e-0fe2dac3 crashed on Instance Manager instance-manager-e-a1fd54e4 at shuo-cluster-0-worker-3, try to get log"
......
time="2021-01-05T11:20:46Z" level=warning msg="Engine of volume dead unexpectedly, reattach the volume" accessMode=rwo controller=longhorn-volume frontend=blockdev node=shuo-cluster-0-worker-3 owner=shuo-cluster-0-worker-3 state=attached volume=pvc-342edde0-d3f4-47c6-abf6-bf8eeda3c32c
......
time="2021-01-05T11:20:46Z" level=info msg="Event(v1.ObjectReference{Kind:\"Volume\", Namespace:\"longhorn-system\", Name:\"pvc-342edde0-d3f4-47c6-abf6-bf8eeda3c32c\", UID:\"69bb0f94-da48-4d15-b861-add435f25d00\", APIVersion:\"longhorn.io/v1beta1\", ResourceVersion:\"7466467\", FieldPath:\"\"}): type: 'Warning' reason: 'DetachedUnexpectedly' Engine of volume pvc-342edde0-d3f4-47c6-abf6-bf8eeda3c32c dead unexpectedly, reattach the volume"
The mount point of the Longhorn volume becomes invalid once the Longhorn volume crashes unexpectedly. Then there is no way to read or write data in the Longhorn volume via the mount point.
An engine crash is normally contributed to by losing the connections to every single replica. Here are the possible reasons why that’s happened:
For Longhorn versions earlier than v1.1.0, Longhorn will try to remount the volume automatically, but the scenarios it can handle are limited.
Since Longhorn version v1.1.0, a new setting Automatically Delete Workload Pod when The Volume Is Detached Unexpectedly
is introduced so that Longhorn will automatically delete the workload pod that is managed by a controller (e.g. deployment, statefulset, daemonset, etc…).
If the workload is a simple pod, you can delete and re-deploy the pod. Please make sure the related PVC or PV is not removed if the reclaim policy is not Retain
. Otherwise, the Longhorn volume will be removed once the related PVC/PV is gone.
If the workload pod belongs to Deployment/StatefulSet, you can restart the pod by scaling down then scaling up the workload replica.
And for Longhorn v1.1.0 or higher version, you can enable the setting Automatically Delete Workload Pod when The Volume Is Detached Unexpectedly
.
Users accidentally or manually detach the Longhorn volume while the related workload is still using the volume.
Automatically Delete Workload Pod when The Volume Is Detached Unexpectedly
: https://github.com/longhorn/longhorn/issues/1719.Recent articles
Troubleshooting: NoExecute taint prevents workloads from terminating© 2019-2024 Longhorn Authors | Documentation Distributed under CC-BY-4.0
© 2024 The Linux Foundation. All rights reserved. The Linux Foundation has registered trademarks and uses trademarks. For a list of trademarks of The Linux Foundation, please see our Trademark Usage page.