Performance and Scalability Report for Longhorn v1.0
Sheng Yang | August 12, 2020
Longhorn is an official CNCF project that delivers a powerful cloud-native distributed storage platform for Kubernetes that can run anywhere. Longhorn makes the deployment of highly available persistent block storage in your Kubernetes environment easy, fast, and reliable.
Since the Longhorn v1.0.0 release, we’ve received many queries regarding the performance and scalability aspects of Longhorn. We’re glad to share some results here.
We’re using a forked version of dbench, which uses fio to benchmark Kubernetes persistent disk volumes. It collects the data regarding read/write IOPS
, bandwidth
and latency
.
We built a Kubernetes cluster using AWS EC2 instances.
One note on the disks: we’re using the EC2 instance store for the benchmark, which is located on disks that are physically attached to the host computer. It can provide better performance in comparison to EBS volume, especially in terms of IOPS.
c5d.2xlarge
Disk: 200 GiB NVMe SSD as the instance store.
CPU: 8 vCPUs (Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz)
Memory: 16 GB
Network: Up to 10Gbps
3 nodes.
All nodes are both master and worker nodes.
Kubernetes: v1.17.5.
Node OS: 5.3.0-1023-aws #25~18.04.1-Ubuntu SMP
Longhorn: v1.0.1
As you can see in the diagram above:
With 1 replica, Longhorn provides the same bandwidth as the native disk.
With 3 replicas, Longhorn provides 1.5 times to 2+ times performance compared to a single native disk. This is because Longhorn uses multiple replicas on different nodes and disks in response to the workload’s request.
As you can see in the IOPS diagram above, Longhorn provides 20% to 30% IOPS of the native disk.
One of the reasons for the lower IOPS is because Longhorn is designed to be crash consistent across the cluster. The data sent to a Longhorn volume will be replicated to replicas on different nodes in a synchronized way. Longhorn will wait for the confirmation that the data has been written to every replica’s disk before continuing. This makes sure in the event of losing any replica, the other replicas will still have the up-to-date data.
As you can see from the latency diagram, the native disk’s IO latency is about 100 microseconds per IO operation in our benchmark. Longhorn adds another 400 microseconds to 500 microseconds on top of it, depending on how many replicas are used and if the operation is read or write.
We continue working on the performance optimization to reduce the latency introduced by the Longhorn stack.
We built a Kubernetes cluster using AWS EC2 instances for the benchmark.
m5.2xlarge
CPU: 8 vCPUs
Memory: 32 GB Memory
Master nodes: 3 Worker nodes: 100
Kubernetes: v1.18.6, installed using Rancher
Longhorn v1.0.1
We created 100 StatefulSets with a VolumeClaimTemplate that uses Longhorn.
Each of the 100 Nodes had one StatefulSet bound to it using a nodeSelector.
During the test, we scaled each StatefulSet to 10. Both the total Pod count and Longhorn Volume count at the end of testing was 1000.
Then every two minutes we checked how many Pods had been successfully started. All the Pods contain a LivenessProbe to guarantee the functionality of the Longhorn Volume.
As you can see from the diagram above, except for the first 100 nodes (which needs a bit more ramp-up time due to the image pull), the scalability of Longhorn is near-linear, until when we hit about 950 pods.
For the first 950 Pods with Longhorn Volumes, Kubernetes and Longhorn only spent about 1500 seconds (25 minutes) to spin them all up. However, for the remaining 50 Pods, it took another 1000 seconds (~17 minutes), which means the last 5% of the pods took about 40% of the time of the whole scalability test. We’re still looking into the reason. We haven’t determined if it’s a Kubernetes or Longhorn issue.
We encountered a couple of Kubernetes and Longhorn issues during the scalability testing:
Recent posts
Longhorn 1.4.1 released© 2019-2024 Longhorn Authors | Documentation Distributed under CC-BY-4.0
© 2024 The Linux Foundation. All rights reserved. The Linux Foundation has registered trademarks and uses trademarks. For a list of trademarks of The Linux Foundation, please see our Trademark Usage page.