Identifying and Recovering from Data Errors
If you’ve encountered an error message like the following:
'fsck' found errors on device /dev/longhorn/pvc-6288f5ea-5eea-4524-a84f-afa14b85780d but could not correct them.
Then you have a data corruption situation. This section describes how to address the issue.
To determine if the error is caused because one of the underlying disks went bad, follow these steps to identify corrupted replicas.
If most of the replicas on the disk went bad, that means the disk is unreliable now and should be replaced.
If only one replica on the disk went bad, it can be a situation known as bit rot
. In this case, removing the replica is good enough.
If all the replicas are identical, then the volume needs to be recovered using snapshots.
The reason for this is probably that the bad bit was written from the workload the volume attached to.
To revert to a previous snapshot:
/dev/longhorn/<volume_name>
and check the volume content.If all of the methods above failed, use a backup to recover the volume.
© 2019-2024 Longhorn Authors | Documentation Distributed under CC-BY-4.0
© 2024 The Linux Foundation. All rights reserved. The Linux Foundation has registered trademarks and uses trademarks. For a list of trademarks of The Linux Foundation, please see our Trademark Usage page.