POSRho: Efficient Spearman’s Rho Calculation for Big Data


연구 분야: Databases



학회: CCF Conference on Big Data


초록

The increasing volume and complexity of data in various scientific domains necessitate robust and scalable methods for statistical analysis. Spearman’s rank correlation coefficient, denoted as , is a non-parametric measure that evaluates the monotonic relationships between variables. However, traditional methods for computing struggle with the scalability and efficiency required for large datasets characteristic of the big data era. This paper introduces POSRho, a novel algorithm designed for the efficient and scalable computation of Spearman’s rank correlation coefficient in big data settings. Leveraging parallel and distributed computing frameworks, POSRho addresses the primary challenges posed by big data, including high computational complexity, significant memory constraints, and data distribution and heterogeneity issues. We detail the algorithm’s design, which utilizes data partitioning, parallel rank calculation, and efficient aggregation methods to optimize computational resources and minimize execution time while maintaining the accuracy of the correlation measure. Empirical results demonstrate that POSRho significantly reduces computation time compared to conventional methods without sacrificing accuracy, thus providing a practical solution for big data analytics in various applications such as genomics, finance, and social science research. The adaptability of POSRho across different computing environments and its integration into existing big data platforms underscore its utility and innovation in addressing the computational demands of modern data analysis.


Author Profile
Xiaofei Zhao

Lanzhou Petrochemical University of Vocational Technology Gansu China

China
Author Profile
Fanglin Guo

Lanzhou Jiaotong University Gansu China

China

📄 논문 정보

발행 연도 2025년
인용수 0
출판 국가 China
사이트 Springer
좋아요 수 0

연관 논문 목록 (177건)