Bridging Clone Detection and Industrial Compliance: A Practical Pipeline for Enterprise Codebases


연구 분야: Verification



학회: Australasian Conference on Information Security and Privacy


초록

Code duplication is a common phenomenon in large scale enterprise software systems, especially in regulated domains such as banking. However, most existing tools rely on tokenizers like javalang, which only support Java 8 and fail to handle modern language features introduced in Java 11 and Java 17. This limits their applicability in real world industrial environments. In this work, we propose a practical and scalable clone detection pipeline that integrates semantic similarity via TF-IDF with structural validation using Tree Edit Distance. Our method is lightweight and fully compatible with newer Java versions. Experimental results on the BigCloneBench dataset show that our approach achieves precision and recall comparable to those of two representative tools, Toma and Amain, while overcoming their limitation of Java version compatibility. Further evaluations on synthetically generated Java 11/17 code confirm the robustness of our method in detecting Type-1 and Type-2 clones in modern codebases.


Author Profile
Shigang Liu

Swinburne University of Technology Melbourne Australia

Australia
Author Profile
Xiaowei Zhang

Swinburne University of Technology Melbourne Australia

Australia
Author Profile
Yang Xiang

Swinburne University of Technology Melbourne Australia

Australia

📄 논문 정보

발행 연도 2025년
인용수 0
출판 국가 Australia
사이트 Springer
좋아요 수 0

연관 논문 목록 (277건)