V2 Disk Size Aggregation
David Cheng | December 23, 2025
This article is intended for Kubernetes administrators and system engineers running Longhorn v2 on nodes with multiple local disks who want to aggregate disk capacity or improve I/O performance. It explains the available disk aggregation options, their trade-offs, and why Linux Kernel RAID is currently the recommended approach.
Modern Kubernetes nodes often include multiple local disks—NVMe, SSD, or HDD—that users want to combine into a single larger storage unit. Longhorn supports using aggregated block devices as storage backends, but the aggregation itself must be created by the user on the host node before Longhorn consumes it.
Currently, the recommended way to aggregate disks for Longhorn v2 is to use Linux Kernel RAID (mdadm). Although SPDK provides RAID and concat capabilities, their current limitations make kernel RAID the more practical choice. This article explains how to create and remove aggregated disks and, more importantly, why Longhorn does not introduce a built-in SPDK RAID layer at this time.
Longhorn v2 uses a fully SPDK-based data engine. At first glance, building disk aggregation on top of SPDK RAID appears intuitive. However, after evaluating SPDK RAID 0 and SPDK Concat, several drawbacks prevent them from being adopted as the default aggregation layer for Longhorn v2 today.
The observations below are based on internal testing.
SPDK RAID 0 requires all member disks to operate at the size of the smallest disk in the array. For example:
| Disk | Size |
|---|---|
| nvme1 | 50Gi |
| nvme2 | 100Gi |
| nvme3 | 100Gi |
The usable capacity of this SPDK RAID 0 array is: 3 × 50Gi = 150Gi.
The remaining capacity on the larger disks is unused because SPDK RAID 0 truncates all members to the smallest size. This behavior makes SPDK RAID 0 impractical for environments with disks of mixed sizes, which is common on bare-metal and cloud instances. Linux RAID 0 does not impose this limitation.
From a performance perspective, SPDK RAID 0 behaves as expected:
However, achieving optimal performance typically requires:
Without careful tuning, SPDK RAID 0 often provides limited advantages over Linux RAID 0. Given Longhorn’s focus on operational simplicity, requiring users to manually tune SPDK internals is not desirable.
LVS
└── SPDK RAID 0
├── Bdev nvme
│ └── /dev/nvme1
├── Bdev nvme
│ └── /dev/nvme2
└── Bdev nvme
└── /dev/nvme3
SPDK Concat mode:
Because Concat does not interleave I/O across disks, it behaves similarly to a single raw device. Although Concat does not stripe data like Linux RAID 0, the stripe size still affects I/O behavior. Internally, the RAID bdev layer uses the stripe size as an optimal_io_boundary and enables split_on_optimal_io_boundary. Large sequential I/O may be split into smaller requests before reaching the RAID module. If the stripe size is too small (for example, 4K), this excessive splitting can severely reduce sequential throughput without providing any parallelism.
In contrast, Linux RAID 0 stripes data across all disks and processes I/O in parallel, allowing both sequential and random workloads to scale with disk count. SPDK Concat performs like a single large linear device, serving I/O sequentially within each disk region, without any concurrency or bandwidth aggregation.
LVS
└── SPDK Concat
├── Bdev nvme
│ └── /dev/nvme1
├── Bdev nvme
│ └── /dev/nvme2
└── Bdev nvme
└── /dev/nvme3
Linux Kernel RAID 0 provides:
It meets Longhorn’s requirements without introducing additional complexity or performance regressions.
LVS
└── Bdev aio
└── Linux Kernel RAID 0 (mdadm)
├── /dev/nvme1
├── /dev/nvme2
└── /dev/nvme3
| Category | SPDK RAID 0 | SPDK Concat | Linux RAID 0 |
|---|---|---|---|
| Capacity Behavior | Limited by smallest disk; wastes capacity with mixed sizes | Uses full capacity | Uses full capacity |
| Sequential Throughput | Very high (striping) | Same as a single disk | Very high (striping) |
| Random IOPS | Scales with number of disks | Same as a single disk | Scales with number of disks |
| Latency | Low | Low | Slightly higher but still low |
| Performance Tuning | CPU pinning and stripe-size tuning often needed | No tuning | No tuning |
| Recovery and Tooling | Limited ecosystem | Limited | Excellent tooling (mdadm, recovery workflows) |
| Suitability for Mixed Disk Sizes | Poor | Good | Good |
| Kernel or Userspace | Userspace (SPDK) | Userspace (SPDK) | Kernel native |
| Integration with Longhorn | Requires SPDK-level configuration | Requires SPDK-level configuration | Works out of the box as a block device |
| Overall Recommendation (2025) | Not recommended | Not recommended for performance | Recommended |
Install mdadm using your system package manager (for example, sudo apt install mdadm -y or sudo yum install mdadm -y).
Create a RAID 0 array from the desired devices:
sudo mdadm --create /dev/md0 \
--level=0 \
--raid-devices=3 \
/dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1
After the RAID device (for example, /dev/md0) is created, add it to the Longhorn cluster through the UI or via kubectl. Longhorn accesses this device using the AIO backend.
Remove the aggregated disk from the Longhorn system using the UI or kubectl.
Stop the RAID device:
sudo mdadm --stop /dev/md0
Remove the mdadm superblock from each member disk:
sudo mdadm --zero-superblock /dev/nvme1n1
sudo mdadm --zero-superblock /dev/nvme2n1
sudo mdadm --zero-superblock /dev/nvme3n1
Verify that the superblocks have been removed:
sudo mdadm --examine /dev/nvme1n1
sudo mdadm --examine /dev/nvme2n1
sudo mdadm --examine /dev/nvme3n1
Expected output:
mdadm: No md superblock detected on /dev/nvme1n1.
mdadm: No md superblock detected on /dev/nvme2n1.
mdadm: No md superblock detected on /dev/nvme3n1.
This benchmark uses kbench to evaluate different aggregation configurations under varying replica counts.
FIO Test Parameters:
Sequential workload
Random workload
Common parameters
Measured metrics:
Test environment:
c5.xlargeAll bandwidth values shown below are measured in KiB/s. Stripe sizes (64K and 512K) indicate the amount of data written to one disk before continuing to the next disk in a RAID 0 array.
In summary, SPDK RAID 0 delivers strong performance but wastes capacity, SPDK Concat preserves capacity without scaling performance, and Linux RAID 0 provides the most balanced results.
| Configuration | Random IOPS (Read / Write) | Sequential Bandwidth (Read / Write) KiB/s | Random Latency (Read / Write) ns |
|---|---|---|---|
| Baseline (Single Disk) | 3,001 / 3,525 | 128,222 / 128,225 | 638,322 / 941,680 |
| SPDK RAID Concat (4K) | 3,002 / 3,586 | 11,939 / 12,049 | 636,214 / 941,915 |
| SPDK RAID Concat (64K) | 3,002 / 3,689 | 128,229 / 128,245 | 642,153 / 951,388 |
| SPDK RAID 0 (64K) | 8,985 / 9,005 | 384,820 / 384,785 | 731,702 / 1,042,055 |
| SPDK RAID 0 (512K) | 9,004 / 8,981 | 384,643 / 384,513 | 639,568 / 945,823 |
| mdadm RAID 0 (512K) | 8,983 / 8,981 | 384,503 / 384,492 | 647,074 / 954,796 |
| Configuration | Random IOPS (Read / Write) | Sequential Bandwidth (Read / Write) KiB/s | Random Latency (Read / Write) ns |
|---|---|---|---|
| Baseline (Single Disk) | 9,015 / 3,476 | 384,783 / 128,265 | 637,628 / 1,071,141 |
| SPDK RAID Concat (4K) | 9,017 / 3,409 | 36,013 / 12,215 | 642,653 / 1,075,667 |
| SPDK RAID Concat (64K) | 9,004 / 3,558 | 384,831 / 128,238 | 646,068 / 1,037,210 |
| SPDK RAID 0 (64K) | 26,992 / 8,973 | 1,075,181 / 384,849 | 644,169 / 1,083,213 |
| SPDK RAID 0 (512K) | 26,936 / 9,003 | 941,377 / 380,937 | 642,769 / 1,074,282 |
| mdadm RAID 0 (512K) | 14,334 / 9,041 | 963,234 / 378,805 | 646,411 / 1,070,201 |
Minor variation is expected due to environmental or network factors.
Single-disk vs. SPDK Concat
Single-disk and SPDK Concat show similar random I/O performance since each request is served by a single underlying device. Sequential throughput should also be close to a single disk; large drops typically indicate excessive I/O splitting caused by a small configured stripe size, rather than an inherent limitation of Concat.
Single-replica volumes
For single-replica volumes, Linux Kernel RAID 0 performs similarly to SPDK RAID 0, delivering near–RAID 0 throughput without requiring SPDK-specific tuning. Both approaches provide strong sequential bandwidth and scale random IOPS with the number of disks.
Multi-replica volumes
For multi-replica volumes, SPDK RAID 0 can outperform Linux Kernel RAID 0 when the stripe size is carefully tuned (for example, 64K). In these scenarios, SPDK’s userspace datapath can reduce overhead and achieve higher sequential throughput under optimal configurations.
Overall, Linux Kernel RAID 0 provides the best balance of capacity utilization, operational simplicity, and predictable performance. In contrast, SPDK RAID 0 and SPDK Concat exhibit limitations that currently prevent them from being recommended as the primary disk aggregation layer for Longhorn v2.
Longhorn v2 prioritizes stability, predictable performance, and low operational complexity. Although SPDK provides RAID and concat capabilities, several limitations prevent these modes from being adopted as the default disk aggregation solution:
SPDK RAID 0 wastes capacity when disk sizes differ.SPDK Concat preserves capacity but does not provide parallel I/O.SPDK RAID 0 performance requires advanced tuning, such as CPU pinning and stripe size configuration.In practice, users can select the appropriate Linux Kernel RAID level based on their desired balance between performance and data protection:
RAID 0 can be used when maximum performance and capacity utilization are required and data redundancy is handled at the Longhorn replica layer.RAID 5 can be used when additional disk-level fault tolerance is desired, at the cost of some write-performance overhead.It is also important to note that with this approach, block-type disks in Longhorn are intentionally exposed using the AIO disk driver.
For these reasons, Linux Kernel RAID remains the recommended approach for disk size aggregation in Longhorn v2, offering a flexible choice of RAID levels, proven reliability, and lower operational complexity compared to SPDK-based aggregation.
Longhorn may consider introducing a built-in RAID layer in the future if the following conditions are met:
SPDK RAID 0 supports heterogeneous disk sizes without capacity loss.SPDK Concat delivers meaningful performance improvements.Until then, Linux Kernel RAID continues to offer the best balance of:
Recent posts
V2 Disk Size Aggregation© 2019-2025 Longhorn Authors | Documentation Distributed under CC-BY-4.0
© 2025 The Linux Foundation. All rights reserved. The Linux Foundation has registered trademarks and uses trademarks. For a list of trademarks of The Linux Foundation, please see our Trademark Usage page.