연구 분야: Databases
학회: CCF Transactions on Pervasive Computing and Interaction
In the era of the Internet of Things (IoT), the proliferation of interconnected devices and sensors has led to an unprecedented deluge of data. Effective data analysis, particularly clustering, has become pivotal in handling the challenges posed by the vast volumes of IoT data. Clustering evaluation plays a critical role in determining the quality of clustering results. However, traditional cluster validity metrics are ill-suited for the distributed nature of IoT data. To address this gap, we introduce a novel distributed clustering evaluation metric named C4Y. It is rooted in sampling theory and is designed to evaluate the performance of clustering algorithms in distributed IoT environments. It operates based on two key principles: (1) Each dataset within distributed IoT node is treated as a sample of the entire dataset, and the expectation is that each sample exhibits similar data distribution, including category distribution, to the overall dataset. (2) It assumes that the centers of each category in all samples conform to a Gaussian distribution. This metric quantifies the extent to which category centers in different samples adhere to Gaussian distributions and measures the dissimilarity between these categories. Empirical results across various public datasets, spanning diverse sizes and dimensions, demonstrate that C4Y effectively assesses the performance of distributed clustering algorithms. This innovative approach promises to advance data analytics within the realm of distributed IoT data, underpinning the development of sophisticated IoT systems.
| 발행 연도 | 2024년 |
|---|---|
| 인용수 | 0 |
| 출판 국가 | Andorra |
| 사이트 | Springer |
| 좋아요 수 | 0 |