Scheduling
In this section, you’ll learn how Longhorn schedules replicas based on multiple factors.
Longhorn’s scheduling policy has two stages. The scheduler only goes to the next stage if the previous stage is satisfied. Otherwise, the scheduling will fail.
If any tag has been set in order to be selected for scheduling, the node tag and the disk tag have to match when the node or the disk is selected.
The first stage is the node and zone selection stage. Longhorn will filter the node and zone based on the Replica Node Level Soft Anti-Affinity
and Replica Zone Level Soft Anti-Affinity
settings.
The second stage is the disk selection stage. Longhorn will filter the disks that satisfy the first stage based on the Replica Disk Level Soft Anti-Affinity
, Storage Minimal Available Percentage
, Storage Over Provisioning Percentage
, and other disk-related factors like requested disk space.
Longhorn evaluates which nodes are suitable for scheduling a new replica based on a series of criteria. The decision-making process follows a specific order to ensure optimal placement for fault tolerance.
Longhorn first checks for node selector tags on the volume.
true
(default): Schedules on nodes with or without tags.false
: Schedules only on nodes without tags.The setting Disable Scheduling On Cordoned Node determines whether cordoned nodes are eligible for replica scheduling:
true
(default): Cordoned nodes are excluded.false
: Cordoned nodes are eligible.Longhorn prioritizes spreading replicas across different nodes and zones to improve fault tolerance. A “new” node or zone is one that does not currently host any replica of the volume, while an “existing” node or zone already hosts a replica of the volume. The selection logic proceeds in the following order:
The scheduler attempts to place the new replica in the most “isolated” location possible, following this hierarchy of preference:
The following table details the required settings for a replica to be scheduled in each scenario:
Scenario | Replica Zone Level Soft Anti-Affinity | Replica Node Level Soft Anti-Affinity | Scheduler Action |
---|---|---|---|
New Node in a New Zone | false | false | Schedules the replica. |
Any other value | Any other value | Does not schedule the replica. | |
New Node in an Existing Zone | true | false | Schedules the replica if no new zone is available. |
Any other value | Any other value | Does not schedule the replica. | |
Existing Node in an Existing Zone | true | true | Schedules the replica if no other options are available. |
Any other value | Any other value | Does not schedule the replica. |
Once the node and zone stage is satisfied, Longhorn decides whether it can schedule the replica on any disk of the selected node. It checks the available disks based on matching tags, total disk space, and available disk space. It also considers whether another replica already exists and the anti-affinity settings.
Longhorn checks all available disks on the selected node to ensure they meet the following criteria:
Disk Tag Matching:
true
(default): Allows scheduling on disks with or without tags.false
: Only allows scheduling on disks without tags.Available Space Check:
Storage Minimal Available Percentage
.Anti-Affinity Settings:
Space Conditions: Two formulas determine if a disk is schedulable:
(Storage Available - Actual Size) > (Storage Maximum × Minimal Available Percentage) / 100
(Size + Storage Scheduled) ≤ ((Storage Maximum - Storage Reserved) × Over Provisioning Percentage) / 100
Note: During disk evaluation, since no specific replica is being scheduled yet,
Actual Size
andSize
are temporarily treated as0
in these formulas.
If any of these conditions fail including disk tag, anti-affinity, or space requirements, the disk is marked unschedulable, and Longhorn will not place the replica on it.
If either condition fails or the disk does not meet tag or anti-affinity requirements, it is marked unschedulable, and Longhorn will not place the replica on that disk.
Consider a node (Node A) with two disks:
During the initial disk selection stage, Longhorn performs a basic check on all available disks. At this point, no specific replica has been selected, so Actual Size
and Size
are treated as 0
.
Disk X Evaluation
Storage Minimal Available Percentage
: 25% (default)(4 GB × 25) / 100 = 1 GB
Actual Space Usage Condition
because its available space (1 GB) is not greater than the minimum required (1 GB). Therefore, Disk X is not schedulable unless the Storage Minimal Available Percentage
is set to 0.Disk Y Evaluation
Storage Minimal Available Percentage
: 10%(8 GB × 10) / 100 = 0.8 GB
Actual Space Usage Condition
because its available space (2 GB) is greater than the minimum required (0.8 GB).Next, we check the Scheduling Space Condition:
Storage Reserved
: 1 GBOver Provisioning Percentage
: 100% (default)(8 GB - 1 GB) × 100 / 100 = 7 GB
Scheduling Space Condition
because the currently scheduled space (2 GB) is less than the max provisionable storage (7 GB).Since Disk Y passes all conditions, it is marked as a schedulable disk candidate.
Let’s assume both Disk X and Disk Y pass the initial space checks and Disk X already hosts a replica for the same volume.
Hard Anti-Affinity
Soft Anti-Affinity
For more information on settings that are relevant to scheduling replicas on nodes and disks, refer to the settings reference:
Longhorn relies on label topology.kubernetes.io/zone=<Zone name of the node>
or topology.kubernetes.io/region=<Region name of the node>
in the Kubernetes node object to identify the zone/region.
Since these are reserved and used by Kubernetes as well-known labels.
© 2019-2025 Longhorn Authors | Documentation Distributed under CC-BY-4.0
© 2025 The Linux Foundation. All rights reserved. The Linux Foundation has registered trademarks and uses trademarks. For a list of trademarks of The Linux Foundation, please see our Trademark Usage page.