ITRT(IT Research Trends)

Improved KD-tree based imbalanced big data classification and oversampling for MapReduce platforms

연구 분야: Databases

논문 키워드: #algorithm #algorithms #efficient #imbalanced #popular

학회: Applied Intelligence

초록

In the era of big data, it is necessary to provide novel and efficient platforms for training machine learning models over large volumes of data. The MapReduce approach and its Apache Spark implementation are among the most popular methods that provide high-performance computing for classification algorithms. However, they require dedicated implementations that will take advantage of such architectures. Additionally, many real-world big data problems are plagued by class imbalance, posing challenges to the classifier training step. Existing solutions for alleviating skewed distributions do not work well in the MapReduce environment. In this paper, we propose a novel KD-tree based classifier, together with a variation of the SMOTE algorithm dedicated to the Spark platform. Our algorithms offer excellent predictive power and can work simultaneously with binary and multi-class imbalanced data. Exhaustive experiments conducted using the Amazon Web Service platform showcase the high efficiency and flexibility of our proposed algorithms.

📄 논문 정보

발행 연도	2024년
인용수	0
출판 국가	United States
사이트	Springer
좋아요 수	0

Improved KD-tree based imbalanced big data classification and oversampling for MapReduce platforms

Improved KD-tree based imbalanced big data classification and oversampling for MapReduce platforms

William C. Sleeman IV

Martha Roseberry

Preetam Ghosh

Alberto Cano

Bartosz Krawczyk

📄 논문 정보

연관 논문 목록 (259건)

Improved KD-tree based imbalanced big data classification and oversampling for MapReduce platforms

Improved KD-tree based imbalanced big data classification and oversampling for MapReduce platforms

📄 논문 정보

연관 논문 목록 (259건) 내 서재 담기

연관 논문 목록 (259건)