A comparative performance study of spark on kubernetes


연구 분야: Software Development



학회: The Journal of Supercomputing


초록

Kubernetes makes it easier to automate deployment and scale containerized applications to achieve a near-native performance. However, there is still a lack of systematic performance studies on how Spark applications perform on Kubernetes. In this paper, we first propose a model to capture the execution behavior of tasks, stages, and jobs, and present an implementation of a prototype system based on the model. The system is then used to collect and analyze various types of performance and system metrics, such as execution time and CPU utilization. Second, with the use of various Spark applications, we evaluate the performance of Spark on Kubernetes by comparing it with its baseline, i.e., Spark on bare metal. Based on the comparison and leveraging the system, we locate what stages suffer from the performance loss of these applications on Kubernetes, and then reveal the root causes of the loss by analyzing their work-flows, execution time and costs of system resources. Through extensive measurements, we find that Spark on Kubernetes falls behind its baseline in the range of to 83.9%. There are several root causes of the performance loss and benefits of Spark on Kubernetes. First, data locality deterioration by pods is a crucial root cause of the loss. To address the problem, we propose an approach to schedule tasks by taking both data locality and the utilization of executors into account. Experiments show that this approach increases the performance of Spark on Kubernetes by up to 32.2%. Second, the lower CPU usages of executors are another root cause of the performance loss, even if they have an equivalent CPU configuration on both Kubernetes and bare metal. In contrast, with the same memory configuration, executors use more memory on Kubernetes than on bare metal, contributing to the performance benefit of Spark on Kubernetes in some stages. Our research efforts in this paper benefit developers and researchers when they make valuable decisions on deploying Spark applications on Kubernetes for a better performance.


Author Profile
Changpeng Zhu

Department of Data Science and Big Data Chongqing University of Technology Pufu Avenue 401135 Chongqing China

Andorra
Author Profile
Bo Han

School of Computer Science Xi’an Jiaotong University West Xianning Road Xi’an 710049 Shaanxi China

China
Author Profile
Yinliang Zhao

School of Computer Science Xi’an Jiaotong University West Xianning Road Xi’an 710049 Shaanxi China

China

📄 논문 정보

발행 연도 2022년
인용수 4
출판 국가 Andorra, China
사이트 Springer
좋아요 수 0

연관 논문 목록 (276건)