Kubernetes resource revision frequency expectations

| February 23, 2024

Applicable versions

The information in this article was specifically researched for v1.6.0, but is largely applicable to other versions.

Background

Each time a Kubernetes resource is updated, the Kubernetes API server instructs etcd to create a new revision of its associated key. The version number of this revision is incremented by one. etcd continues to store past revisions of all keys until its key store is compacted to save space.

A user noticed that the etcd keys associated with certain Longhorn objects had greater version numbers compared to the keys of other objects in their cluster. The concern was that the greater version numbers for Volume, Engine, and Node resources indicated issues in their Longhorn installation.

"Key" : "/registry/longhorn.io/volumes/longhorn-system/<some_volume>"
"CreateRevision" : 988635856
"ModRevision" : 1137860097
"Version" : 2804569

"Key" : "/registry/longhorn.io/engines/longhorn-system/<some_volume>-e-<some_hash>"
"CreateRevision" : 988635861
"ModRevision" : 1137860096
"Version" : 2274122

"Key" : "/registry/longhorn.io/nodes/longhorn-system/<some_node>"
"CreateRevision" : 988599936
"ModRevision" : 1137859827
"Version" : 462517

Frequency of revisions during periods of churn

Longhorn may update resources very frequently to track state as it ensures volumes are ready to serve workloads. Predicting the number of revisions to be created during certain activities is difficult. In most clusters, activities such as node rebooting, replica rebuilding, and volume creation are relative rarities in between long stretches of stable workload activity.

Frequency of revisions during periods of stability

Longhorn frequently updates the statuses of Volume, Engine, and Node resources, even when nothing notable is occurring. These updates help keep track of the actual space consumed by Longhorn volumes at any given moment, particularly the following:

  • engine.status.snapshots[].size: Tracks the space consumed by each volume snapshot in real time. An engine monitor collects and then pushes the information to the Engine object every five seconds (if the consumption has changed and new blocks are being written). Longhorn does not push a status update if no changes are made.
  • volume.status.actualSize: Tracks the actual space consumed by a specific volume in real time, which is calculated based on values obtained from engine.status.snapshots[].size. This field in the Volume object is updated if new blocks are being written and engine.status.snapshots[].size is updated every five seconds. The information is displayed prominently on the Longhorn UI to help you understand your system’s space consumption on a per-volume basis.
  • node.status.diskStatus[].storageAvailable: Tracks the actual amount of space available on a physical disk in real time. A disk monitor collects and then pushes the information to the Node object every 30 seconds (if the available space has changed). Since physical disk space is in constant flux, Longhorn only updates this field if a change exceeds 100 MiB. Longhorn uses the information to make replica scheduling decisions.

Frequency estimates

Assuming new blocks are constantly being written to a Longhorn volume, approximately 17,280 revisions per day can be expected for Engine and Volume objects. The volume from the background section could have hit version 2804569 in approximately 162 days. (Many volumes might not experience this kind of constant write activity).

(12 revisions/min)(60 min/hr)(24 hr/day) = 17,280 revisions/day

Assuming at least 100 MiB of new blocks are constantly being written to all Longhorn volumes with replicas on a physical disk, approximately 2,880 revisions per day can be expected for Node objects. The node from the background section could have hit version 462517 in approximately 161 days. (Many nodes might not experience this kind of constant write activity).

(2 revisions/min)(60 min/hr)(24 hr/day) = 2,880 revisions/day
Back to Knowledge Base

Recent articles

Troubleshooting: NoExecute taint prevents workloads from terminating
Troubleshooting: Orphan ISCSI Session Error
Failure to Attach Volumes After Upgrade to Longhorn v1.5.x
Kubernetes resource revision frequency expectations
SELinux and Longhorn
Troubleshooting: RWX Volume Fails to Be Attached Caused by `Protocol not supported`
Troubleshooting: fstrim doesn't work on old kernel
Troubleshooting: Failed RWX mount due to connection timeout
Space consumption guideline
Troubleshooting: Unexpected expansion leads to degradation or attach failure
Troubleshooting: Failure to delete orphaned Pod volume directory
Troubleshooting: Volume attachment fails due to SELinux denials in Fedora downstream distributions
Troubleshooting: Volumes Stuck in Attach/Detach Loop When Using Longhorn on OKD
Troubleshooting: Velero restores Longhorn PersistentVolumeClaim stuck in the Pending state when using the Velero CSI Plugin version before v0.4.0
Analysis: Potential Data/Filesystem Corruption
Instruction: How To Migrate Longhorn Chart Installed In Old Rancher UI To The Chart In New Rancher UI
Troubleshooting: Unable to access an NFS backup target
Troubleshooting: Pod with `volumeMode: Block` is stuck in terminating
Troubleshooting: Instance manager pods are restarted every hour
Troubleshooting: Open-iSCSI on RHEL based systems
Troubleshooting: Upgrading volume engine is stuck in deadlock
Tip: Set Longhorn To Only Use Storage On A Specific Set Of Nodes
Troubleshooting: Some old instance manager pods are still running after upgrade
Troubleshooting: Volume cannot be cleaned up after the node of the workload pod is down and recovered
Troubleshooting: DNS Resolution Failed
Troubleshooting: Generate pprof runtime profiling data
Troubleshooting: Pod stuck in creating state when Longhorn volumes filesystem is corrupted
Troubleshooting: None-standard Kubelet directory
Troubleshooting: Longhorn default settings do not persist
Troubleshooting: Recurring job does not create new jobs after detaching and attaching volume
Troubleshooting: Use Traefik 2.x as ingress controller
Troubleshooting: Create Support Bundle with cURL
Troubleshooting: Longhorn RWX shared mount ownership is shown as nobody in consumer Pod
Troubleshooting: `MountVolume.SetUp failed for volume` due to multipathd on the node
Troubleshooting: Longhorn-UI: Error during WebSocket handshake: Unexpected response code: 200 #2265
Troubleshooting: Longhorn volumes take a long time to finish mounting
Troubleshooting: `volume readonly or I/O error`
Troubleshooting: `volume pvc-xxx not scheduled`

© 2019-2024 Longhorn Authors | Documentation Distributed under CC-BY-4.0


© 2024 The Linux Foundation. All rights reserved. The Linux Foundation has registered trademarks and uses trademarks. For a list of trademarks of The Linux Foundation, please see our Trademark Usage page.