Snapshot Data Integrity Check
Longhorn is capable of hashing snapshot disk files and periodically checking their integrity.
Longhorn system supports volume snapshotting and stores the snapshot disk files on the local disk. However, it is impossible to check the data integrity of snapshots due to the lack of the checksums of the snapshots previously. As a result, when the data is corrupted due to, for example, the bit rot in the underlying storage, there is no way to detect the corruption and repair the replicas. After applying the feature, Longhorn is capable of hashing snapshot disk files and periodically checking their integrity. When a snapshot disk file in one replica is corrupted, Longhorn will automatically start the rebuilding process to fix it.
snapshot-data-integrity
This setting allows users to enable or disable snapshot hashing and data integrity checking. Available options are:
snapshot-data-integrity-immediate-check-after-snapshot-creation
Hashing snapshot disk files impacts the performance of the system. The immediate snapshot hashing and checking can be disabled to minimize the impact after creating a snapshot.
snapshot-data-integrity-cronjob
A schedule defined using the unix-cron string format specifies when Longhorn checks the data integrity of snapshot disk files.
Warning Hashing snapshot disk files impacts the performance of the system. It is recommended to run data integrity checks during off-peak times and to reduce the frequency of checks.
Longhorn also supports the per-volume setting by configuring Volume.Spec.SnapshotDataIntegrity
. The value is ignored
by default, so data integrity check is determined by the global setting snapshot-data-integrity
. Volume.Spec.SnapshotDataIntegrity
supports ignored
, disabled
, enabled
and fast-check
. Each volume can have its data integrity check setting customized.
For detecting data corruption, checksums of snapshot disk files need to be calculated. The calculations consume storage and computation resources. Therefore, the storage performance will be negatively impacted. In order to provide a clear understanding of the impact, we benchmarked storage performance when checksumming disk files. The read IOPS, bandwidth and latency are negatively impacted.
Disk: 200 GiB NVMe SSD as the instance store
Disk: 200 GiB throughput optimized HDD (st1)
The feature helps detect the data corruption in snapshot disk files of volumes. However, the checksum calculation negatively impacts the storage performance. To lower down the impact, the recommendations are
snapshot-data-integrity-cronjob
.snapshot-data-integrity-immediate-check-after-snapshot-creation
.© 2019-2024 Longhorn Authors | Documentation Distributed under CC-BY-4.0
© 2024 The Linux Foundation. All rights reserved. The Linux Foundation has registered trademarks and uses trademarks. For a list of trademarks of The Linux Foundation, please see our Trademark Usage page.