연구 분야: Safety
학회: IFIP International Conference on Artificial Intelligence Applications and Innovations
Surface-level malware analysis offers significant advantages over deep static and dynamic analysis by avoiding the complex and time-consuming process of reverse engineering obfuscated code and eliminating the risk of malware execution. Recent studies have shown that surface-level features alone can achieve high classification accuracy in distinguishing malware from benign software. However, an inherent challenge remains: surface-level datasets often contain an enormous number of features, hindering explainability and manual investigation. A notable example is the Ember dataset, a widely used public dataset for malware detection, which originally consists of more than ten million features. In malware detection, this issue primarily affects memory consumption and computational efficiency, which can be mitigated using techniques such as feature hashing. In contrast, malware analysis requires explainability involving manual investigation based on domain expertise, which necessitates focusing on a small subset of highly relevant features. While feature selection has been extensively studied in machine learning, existing algorithms struggle to balance scalability and selection accuracy. Recently, the authors proposed a novel feature selection algorithm, BornFS, which significantly improves this trade-off, reducing over ten million features of the Ember dataset to only 155 in under two hours while ensuring a mutual information loss below 5%. This paper presents a surface-level malware analysis method that leverages the scalable and accurate BornFS, demonstrating the effectiveness of feature-selection-based malware analysis through experiments.
| 발행 연도 | 2025년 |
|---|---|
| 인용수 | 0 |
| 출판 국가 | Colombia, Japan |
| 사이트 | Springer |
| 좋아요 수 | 0 |