ITRT(IT Research Trends)

A comparative performance study of spark on kubernetes

연구 분야: Software Development

논문 키워드: #research #researchers #developers #experiments #prototype

학회: The Journal of Supercomputing

초록

Kubernetes makes it easier to automate deployment and scale containerized applications to achieve a near-native performance. However, there is still a lack of systematic performance studies on how Spark applications perform on Kubernetes. In this paper, we first propose a model to capture the execution behavior of tasks, stages, and jobs, and present an implementation of a prototype system based on the model. The system is then used to collect and analyze various types of performance and system metrics, such as execution time and CPU utilization. Second, with the use of various Spark applications, we evaluate the performance of Spark on Kubernetes by comparing it with its baseline, i.e., Spark on bare metal. Based on the comparison and leveraging the system, we locate what stages suffer from the performance loss of these applications on Kubernetes, and then reveal the root causes of the loss by analyzing their work-flows, execution time and costs of system resources. Through extensive measurements, we find that Spark on Kubernetes falls behind its baseline in the range of to 83.9%. There are several root causes of the performance loss and benefits of Spark on Kubernetes. First, data locality deterioration by pods is a crucial root cause of the loss. To address the problem, we propose an approach to schedule tasks by taking both data locality and the utilization of executors into account. Experiments show that this approach increases the performance of Spark on Kubernetes by up to 32.2%. Second, the lower CPU usages of executors are another root cause of the performance loss, even if they have an equivalent CPU configuration on both Kubernetes and bare metal. In contrast, with the same memory configuration, executors use more memory on Kubernetes than on bare metal, contributing to the performance benefit of Spark on Kubernetes in some stages. Our research efforts in this paper benefit developers and researchers when they make valuable decisions on deploying Spark applications on Kubernetes for a better performance.

📄 논문 정보

발행 연도	2022년
인용수	4
출판 국가	Andorra, China
사이트	Springer
좋아요 수	0

A comparative performance study of spark on kubernetes

A comparative performance study of spark on kubernetes

Changpeng Zhu

Bo Han

Yinliang Zhao

📄 논문 정보

연관 논문 목록 (276건)

A comparative performance study of spark on kubernetes

A comparative performance study of spark on kubernetes

📄 논문 정보

연관 논문 목록 (276건) 내 서재 담기

연관 논문 목록 (276건)