연구 분야: Databases
학회: Knowledge and Information Systems
Extracting valuable information from vast sources of social networks while protecting confidentiality and preventing data disclosure is a significant challenge in big data environments. Traditional anonymization methods often fall short in handling the volume, variety, and velocity of big data, leading to high data loss and inefficiency. This article addresses these challenges by proposing a novel anonymization method based on K-means clustering within the Spark framework, leveraging its in-memory processing capabilities. Our model uses K-means clustering to determine optimal cluster heads, significantly reducing data loss and identity disclosure risks. By utilizing Spark's RDD abilities and the MLlib component, our method achieves faster processing times compared to traditional methods that rely on non-in-memory big data tools. Performance evaluation demonstrates that at k = 9, the cost factor is minimized to 0.20, indicating the efficiency and effectiveness of our approach. The proposed method not only enhances processing speed but also ensures minimal data loss, making it suitable for real-time anonymization of big data streams. This work provides a balanced solution that addresses the critical need for high-speed data anonymization while maintaining data privacy and utility.
| 발행 연도 | 2025년 |
|---|---|
| 인용수 | 0 |
| 출판 국가 | Andorra, India, Canada |
| 사이트 | Springer |
| 좋아요 수 | 0 |