Revisiting Optimization-Resilience Claims in Binary Diffing Tools: Insights from LLVM Peephole Optimization Analysis


연구 분야: Safety



학회: Proceedings of the ACM on Software Engineering, Volume 2, Issue FSE


초록

Binary diffing technique aims to identify differences/similarities in executable files without source code access. Its potential applications in various software security tasks, such as vulnerability search, code clone detection, and malware analysis have generated a large body of literature over the past few years. A recurring theme in binary diffing research is to evaluate the resilience against the impact of compiler optimization, which is the most common source leading to syntactic differences in binary code. Despite claims by most binary diffing papers that they are immune to compiler optimization, recent studies have highlighted a pressing need for the research community to revisit these optimization-resilience claims. In this paper, we investigate peephole optimization's impact on binary diffing. Mainstream compilers feature a multitude of peephole optimization rules, facilitating local rewriting of input programs to replace instruction sequences within a window (i.e., peephole) with shorter and/or faster equivalents. Our research reveals that peephole optimization primarily affects binary code differences at the intra-procedural level, which contradicts the assumptions made by basic-block centric comparison approaches. We customized an LLVM translation validation tool to investigate the impact of peephole optimization from the overall optimization process. Our experimental results demonstrate 1) peephole optimization modifies binary code during the whole optimization process, and 2) no existing basic-block centric comparison tools can properly deal with all changes caused by peephole optimization, leading to further performance loss in downstream applications. Our study introduces a "peephole-oriented" test suite, designed to isolate and measure the impact of peephole optimizations on binary code. This suite provides a new perspective for evaluating the resilience of binary diffing tools against subtle, intra-procedural code changes, setting a new benchmark for future tool development. Our findings reveal critical insights that challenge existing assumptions in binary diffing, highlighting the need for more robust analysis techniques.


Author Profile
Xiaolei Ren

Macau University of Science and Technology Taipa Macao

Andorra
Author Profile
Mengfei Ren

University of Alabama in Huntsville Huntsville USA

India
Author Profile
Yu Lei

University of Texas at Arlington Arlington USA

Austria

📄 논문 정보

발행 연도 2025년
인용수 0
출판 국가 United States, Andorra, India, Austria
사이트 ACM
좋아요 수 0

연관 논문 목록 (361건)