연구 분야: Verification
학회: Australasian Conference on Information Security and Privacy
Code duplication is a common phenomenon in large scale enterprise software systems, especially in regulated domains such as banking. However, most existing tools rely on tokenizers like javalang, which only support Java 8 and fail to handle modern language features introduced in Java 11 and Java 17. This limits their applicability in real world industrial environments. In this work, we propose a practical and scalable clone detection pipeline that integrates semantic similarity via TF-IDF with structural validation using Tree Edit Distance. Our method is lightweight and fully compatible with newer Java versions. Experimental results on the BigCloneBench dataset show that our approach achieves precision and recall comparable to those of two representative tools, Toma and Amain, while overcoming their limitation of Java version compatibility. Further evaluations on synthetically generated Java 11/17 code confirm the robustness of our method in detecting Type-1 and Type-2 clones in modern codebases.
| 발행 연도 | 2025년 |
|---|---|
| 인용수 | 0 |
| 출판 국가 | Australia |
| 사이트 | Springer |
| 좋아요 수 | 0 |