Balancing the Scales: Using GANs and Class Balance for Superior Malware Detection


연구 분야: Safety



학회: SAC '25: Proceedings of the 40th ACM/SIGAPP Symposium on Applied Computing


초록

Ensuring the security of a network infrastructure necessitates the precise detection and categorization of malware. While existing methodologies have demonstrated higher accuracy, their effectiveness has predominantly been validated on a limited subset of malware families or samples. These analyses often focus on malware families with a higher number of samples, potentially leading to biased and unrepresentative classification results. To address this gap, our study aims to enhance the accuracy and robustness of malware detection and categorization systems by investigating the impact of dataset size, class balance, and data augmentation techniques on classifier performance. We demonstrate the efficacy of our approach on a comparatively larger dataset titled Blue Hexagon Open Dataset for Malware AnalysiS, comprising of 134k samples. Our analysis, exploiting 85 malware families with at least 50 samples each, results in the highest accuracy of 92.28% using Random Forest as the classifier on the original imbalanced dataset. However, by employing Generative Adversarial Networks to generate synthetic samples and achieve balanced class distributions (resulted in balanced datasets), our approach demonstrates the improvement in the classifier's accuracy to 99.35%.


Author Profile
Attaullah Bolzano Buriro

Ca' Foscari University of Venice and Faculty of Engineering Free University of Bolzano-Bozen Bolzano Bolzano Italy

Andorra
Author Profile
Flaminia Venice Luccio

Ca' Foscari University of Venice Venice Italy

Canada
Author Profile
Muhammad Azfar Yaqub

Faculty of Engineering Free University of Bozen-Bolzano Bolzano Italy Bolzano Bolzano Italy

Italy

📄 논문 정보

발행 연도 2025년
인용수 0
출판 국가 Italy, Andorra, Canada
사이트 ACM
좋아요 수 0

연관 논문 목록 (559건)