ITRT(IT Research Trends)

Uncovering and Mitigating the Impact of Code Obfuscation on Dataset Annotation with Antivirus Engines

연구 분야: Analysis

논문 키워드: #android #malware #antivirus #virustotal #indiscriminately

학회: ISSTA 2024: Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis

초록

With the widespread application of machine learning-based Android malware detection methods, building a high-quality dataset has become increasingly important. Existing large-scale datasets are mostly annotated with VirusTotal by aggregating the decisions of antivirus engines, and most of them indiscriminately accept the decisions of all engines. In reality, however, these engines have different capabilities in detecting malware, especially those that have been obfuscated. Previous research has revealed that code obfuscation degrades the detection performance of these engines to varying degrees. This makes us believe that using all engines indiscriminately is unreasonable for dataset annotation. Therefore, in this paper, we first conduct a data-driven evaluation to confirm the negative effects of code obfuscation on engine-based dataset annotation. To gain a deeper understanding of the reasons behind this phenomenon, we evaluate the availability, effectiveness and robustness of every engine under various code obfuscation techniques. Then we categorize the engines and select a set of obfuscation-robust engines. Finally, we conduct comprehensive experiments to verify the effectiveness of the selected engines for dataset annotation. Our experiments show that when 50% obfuscated samples are mixed into the training set, on the classic malware detectors Drebin and Malscan, using our selected engines can effectively improve detection performance by 15.21% and 19.23%, respectively, compared to using all the engines.

📄 논문 정보

발행 연도	2024년
인용수	1
출판 국가	Singapore, Andorra, China
사이트	ACM
좋아요 수	0

Uncovering and Mitigating the Impact of Code Obfuscation on Dataset Annotation with Antivirus Engines

Uncovering and Mitigating the Impact of Code Obfuscation on Dataset Annotation with Antivirus Engines

Yang Liu

Gao Cuiying

Yueming Wu

Heng Li

Wei Yuan

Haoyu Jiang

Qidan He

📄 논문 정보

연관 논문 목록 (112건)

Uncovering and Mitigating the Impact of Code Obfuscation on Dataset Annotation with Antivirus Engines

Uncovering and Mitigating the Impact of Code Obfuscation on Dataset Annotation with Antivirus Engines

📄 논문 정보

연관 논문 목록 (112건) 내 서재 담기

연관 논문 목록 (112건)