ITRT(IT Research Trends)

Milo: Attacking Deep Pre-trained Model for Programming Languages Tasks with Anti-analysis Code Obfuscation

연구 분야: Analysis

논문 키워드: #learning #neural #javascript #snippet #graphcodebert

학회: 2023 IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC)

초록

Deep neural networks, especially pre-trained BERT models, have been widely applied in programming language processing tasks and achieved promising results. Their down-stream applications such as code clone detection and code search play a crucial role in data-driven security solutions such as vulnerability analysis. However, the resilience of these models against anti-analysis attacks remains unexplored. Therefore, we try to investigate whether deep neural networks can remain the same performance on different types of code change and what types of biases are introduced in the learning process.We introduce a new code obfuscation tool, a Multi-programming-language Obfuscator (Milo), for programming language processing tasks. Milo can be used to generate adversarial data to verify the model’s generalizability and robustness against code obfuscations. Milo supports five obfuscation methods: variable renaming, method renaming, string splitting, operation substitution, and control flow shuffling on three mainstream programming languages including Java, Python, and JavaScript. It is designed to apply anti-analysis obfuscation techniques across different programming languages that alter the syntactic and semantic features of a code snippet. To better quantify the adverse effects of anti-analysis techniques on pre-trained models for programming languages, we have performed extensive experiments across several pre-trained models, BERT, CodeBERT, and GraphCodeBERT with four downstream tasks which are code documentation generation, code clone detection, code search, and code translation. Our results indicate that most pre-trained BERT models are susceptible to code obfuscations and rely heavily on the literal representations (name or string) of the code segment.

Leo Song

Queen’s University

정보 없음

Steven H.H. Ding

Queen’s University

정보 없음

📄 논문 정보

발행 연도	2023년
인용수	1
출판 국가
사이트	IEEE
좋아요 수	0

Milo: Attacking Deep Pre-trained Model for Programming Languages Tasks with Anti-analysis Code Obfuscation

Milo: Attacking Deep Pre-trained Model for Programming Languages Tasks with Anti-analysis Code Obfuscation

📄 논문 정보

연관 논문 목록 (78건) 내 서재 담기

연관 논문 목록 (78건)