Performance and Scalability Report for Longhorn v1.0
Sheng Yang | August 12, 2020
Longhorn is an official CNCF project that delivers a powerful cloud-native distributed storage platform for Kubernetes that can run anywhere. Longhorn makes the deployment of highly available persistent block storage in your Kubernetes environment easy, fast, and reliable.
Since the Longhorn v1.0.0 release, we’ve received many queries regarding the performance and scalability aspects of Longhorn. We’re glad to share some results here.
We’re using a forked version of dbench, which uses fio to benchmark Kubernetes persistent disk volumes. It collects the data regarding read/write IOPS, bandwidth and latency.
We built a Kubernetes cluster using AWS EC2 instances.
One note on the disks: we’re using the EC2 instance store for the benchmark, which is located on disks that are physically attached to the host computer. It can provide better performance in comparison to EBS volume, especially in terms of IOPS.
Disk: 200 GiB NVMe SSD as the instance store.
CPU: 8 vCPUs (Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz)
Memory: 16 GB
Network: Up to 10Gbps
All nodes are both master and worker nodes.
Node OS: 5.3.0-1023-aws #25~18.04.1-Ubuntu SMP
As you can see in the diagram above:
With 1 replica, Longhorn provides the same bandwidth as the native disk.
With 3 replicas, Longhorn provides 1.5 times to 2+ times performance compared to a single native disk. This is because Longhorn uses multiple replicas on different nodes and disks in response to the workload’s request.
IOPS and Latency
As you can see in the IOPS diagram above, Longhorn provides 20% to 30% IOPS of the native disk.
One of the reasons for the lower IOPS is because Longhorn is designed to be crash consistent across the cluster. The data sent to a Longhorn volume will be replicated to replicas on different nodes in a synchronized way. Longhorn will wait for the confirmation that the data has been written to every replica’s disk before continuing. This makes sure in the event of losing any replica, the other replicas will still have the up-to-date data.
As you can see from the latency diagram, the native disk’s IO latency is about 100 microseconds per IO operation in our benchmark. Longhorn adds another 400 microseconds to 500 microseconds on top of it, depending on how many replicas are used and if the operation is read or write.
We continue working on the performance optimization to reduce the latency introduced by the Longhorn stack.
We built a Kubernetes cluster using AWS EC2 instances for the benchmark.
CPU: 8 vCPUs
Memory: 32 GB Memory
Master nodes: 3
Worker nodes: 100
Kubernetes: v1.18.6, installed using Rancher
We created 100 StatefulSets with a VolumeClaimTemplate that uses Longhorn.
Each of the 100 Nodes had one StatefulSet bound to it using a nodeSelector.
During the test, we scaled each StatefulSet to 10.
Both the total Pod count and Longhorn Volume count at the end of testing was 1000.
Then every two minutes we checked how many Pods had been successfully started.
All the Pods contain a LivenessProbe to guarantee the functionality of the Longhorn Volume.
As you can see from the diagram above, except for the first 100 nodes (which needs a bit more ramp-up time due to the image pull), the scalability of Longhorn is near-linear, until when we hit about 950 pods.
For the first 950 Pods with Longhorn Volumes, Kubernetes and Longhorn only spent about 1500 seconds (25 minutes) to spin them all up. However, for the remaining 50 Pods, it took another 1000 seconds (~17 minutes), which means the last 5% of the pods took about 40% of the time of the whole scalability test. We’re still looking into the reason. We haven’t determined if it’s a Kubernetes or Longhorn issue.
Other Issues during the Scalability Test
We encountered a couple of Kubernetes and Longhorn issues during the scalability testing:
During a test run, we found that we cannot scale well after hitting 200 volumes in the cluster. After digging deeper into it, we found that it took minutes to tens of minutes for Kubernetes to recognize a newly attached volume after the cluster had more than 200 volumes. In the end, we found this was a Kubernetes bug and has been fixed in v1.17.8 and v1.18.5. See here for the full analysis.
During the installation of Longhorn v1.0.1 in the cluster, we encountered a Longhorn issue that blocked the installation process. The issue will likely occur if the cluster is bigger than 20 nodes, and requires a manual workaround. We’re releasing a fix for the issue in v1.0.2 release. See here for the details.