Obfuscated Clone Search in JavaScript based on Reinforcement Subsequence Learning


연구 분야: Analysis



학회: ACM Transactions on Software Engineering and Methodology, Volume 34, Issue 6


초록

Finding similar code is important for software engineering, defense of intellectual property, and security, and one of the increasingly common ways adversaries use to defeat the detection of similar code is through obfuscations such as code transformation and scattering the code they wish to hide among long sequences. Moving code far enough apart poses a specific challenge for solutions with localized features (e.g., n-grams), or attention mechanisms as the code parts are distributed beyond the local context window. We introduce a neural network solution pattern called “Cybertron” that addresses this problem by utilizing reinforcement learning to train a code abstraction and summarization function; this converts arbitrarily long code into fixed-length real vectors in a way that is optimized for similarity search. The key to the design is the smart selection of important elements of the code and abstraction to preserve semantic function while minimizing syntactic feature information. We evaluated the approach on a three-challenge benchmark of obfuscated JavaScript, a scripting language that is commonly obfuscated and for which code-mixing is a rising challenge. The evaluation shows our approach identifies obfuscated code within even large scripts with an AUC of 78%, which outperforms current state-of-the-art sequence models by 7–35%.


Author Profile
Leo Song

Computing Queen’s University Kingston Ontario Canada

Canada
Author Profile
Steven H H Ding

McGill University Montreal Quebec Canada

Canada
Author Profile
Yuan Tian

Queen’s University Kingston Ontario Canada

Canada

📄 논문 정보

발행 연도 2025년
인용수 0
출판 국가 Canada
사이트 ACM
좋아요 수 0

연관 논문 목록 (44건)