연구 분야: Safety
학회: Cluster Computing
In recent years, various methods have been proposed to detect unknown malware using machine learning models. These models extract features from malware and classify them as benign or malicious. However, there have been reports of evasion attacks against machine learning-based malware detectors. Previous research has focused on these evasion attacks, particularly against models that detect Visual Basics for Applications (VBA) malware using natural language processing models such as Bag of Words (BoW) and Latent Semantic Indexing (LSI). However, these models rely on word frequency as a feature, overlooking the context, and their evaluation involved an equal number of benign and malicious samples, leaving their effectiveness in real-world scenarios unverified. To address these limitations, our study introduces a tokenizer that preserves word order during token conversion. We evaluated its accuracy using an imbalanced dataset and employed models such as recurrent neural networks (RNNs) and long short-term memory (LSTM) networks to understand the context of words. Its detection rate for malicious malware exceeds 0.8, indicating sufficient performance. Moreover, by incorporating words found only in benign samples as arguments in non-functional operations, we managed to reduce the evasion attack detection rate to as low as 0.89. Ultimately, we confirmed that the detection rate of malicious samples remained consistent in real-world conditions, demonstrating the effectiveness of our approach.
| 발행 연도 | 2024년 |
|---|---|
| 인용수 | 0 |
| 출판 국가 | Japan |
| 사이트 | Springer |
| 좋아요 수 | 0 |