연구 분야: Databases
학회: 2021 IEEE International Conference on Big Data (Big Data)
A novel random sample partition-based clustering ensemble (RSP-CE) algorithm is proposed in this paper to handle the big data clustering problems. There are three key components in RSP-CE algorithm, i.e., generating the base clustering results on RSP data blocks, harmonizing the based clustering results with maximum mean discrepancy (MMD) criterion, and refining the RSP clustering results. RSP data blocks have the consistent sample distributions with the whole big data and thus provide the possibility for using base clustering results on different data subsets to approximate the clustering result on whole big data. The experimental results in comparison with other 5 well-known clustering ensemble algorithms on 4 big data sets show that RSP-CE algorithm obtains the better normalized mutual information (NMI) values and Fowlkes-Mallows Index (FMI) values with the less training time consumptions and thus demonstrate that RSP-CE algorithm is a viable approach to deal with the big data clustering problems.
| 발행 연도 | 2021년 |
|---|---|
| 인용수 | 11 |
| 출판 국가 | China |
| 사이트 | IEEE |
| 좋아요 수 | 0 |