Random Sample Partition-Based Clustering Ensemble Algorithm for Big Data


연구 분야: Databases



학회: 2021 IEEE International Conference on Big Data (Big Data)


초록

A novel random sample partition-based clustering ensemble (RSP-CE) algorithm is proposed in this paper to handle the big data clustering problems. There are three key components in RSP-CE algorithm, i.e., generating the base clustering results on RSP data blocks, harmonizing the based clustering results with maximum mean discrepancy (MMD) criterion, and refining the RSP clustering results. RSP data blocks have the consistent sample distributions with the whole big data and thus provide the possibility for using base clustering results on different data subsets to approximate the clustering result on whole big data. The experimental results in comparison with other 5 well-known clustering ensemble algorithms on 4 big data sets show that RSP-CE algorithm obtains the better normalized mutual information (NMI) values and Fowlkes-Mallows Index (FMI) values with the less training time consumptions and thus demonstrate that RSP-CE algorithm is a viable approach to deal with the big data clustering problems.


Author Profile
Xueqin Du

College of Computer Science & Software Engineering Shenzhen University Shenzhen China

China
Author Profile
Yulin He

College of Computer Science & Software Engineering Shenzhen University Shenzhen China

China
Author Profile
Joshua Zhexue Huang

College of Computer Science & Software Engineering Shenzhen University Shenzhen China

China

📄 논문 정보

발행 연도 2021년
인용수 11
출판 국가 China
사이트 IEEE
좋아요 수 0

연관 논문 목록 (229건)