Unsuccessful story about few shot malware family classification and siamese network to the rescue


연구 분야: Safety



학회: ICSE '20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering


초록

To battle the ever-increasing Android malware, malware family classification, which classifies malware with common features into a malware family, has been proposed as an effective malware analysis method. Several machine-learning based approaches have been proposed for the task of malware family classification. Our study shows that malware families suffer from several data imbalance, with many families with only a small number of malware applications (referred to as few shot malware families in this work). Unfortunately, this issue has been overlooked in existing approaches. Although existing approaches achieve high classification performance at the overall level and for large malware families, our experiments show that they suffer from poor performance and generalizability for few shot malware families, and traditionally downsampling method cannot solve the problem. To address the challenge in few shot malware family classification, we propose a novel siamese-network based learning method, which allows us to train an effective MultiLayer Perceptron (MLP) network for embedding malware applications into a real-valued, continuous vector space by contrasting the malware applications from the same or different families. In the embedding space, the performance of malware family classification can be significantly improved for all scales of malware families, especially for few shot malware families, which also leads to the significant performance improvement at the overall level.


Author Profile
Yude Bai

Tianjin University Tianjin China

China
Author Profile
Zhenchang Xing

Australian National University Australia

Australia
Author Profile
Xiaohong Li

Tianjin University Tianjin China

China

📄 논문 정보

발행 연도 2020년
인용수 20
출판 국가 Australia, China
사이트 ACM
좋아요 수 0

연관 논문 목록 (274건)