연구 분야: Analysis
학회: Proceedings of the ACM on Software Engineering, Volume 2, Issue FSE
The current landscape of binary code summarization predominantly focuses on generating a single summary, which limits its utility and understanding for reverse engineers. Existing approaches often fail to meet the diverse needs of users, such as providing detailed insights into usage patterns, implementation nuances, and design rationale, as observed in the field of source code summarization. This highlights the need for multi-intent binary code summarization to enhance the effectiveness of reverse engineering processes. To address this gap, we propose MiSum, a novel method that leverages multi-modality heterogeneous code graph alignment and learning to integrate both assembly code and pseudo-code. MiSum introduces a unified multi-modality heterogeneous code graph (MM-HCG) that aligns assembly code graphs with pseudo-code graphs, capturing both low-level execution details and high-level structural information. We further propose multi-modality heterogeneous graph learning with heterogeneous mutual attention and message passing, which highlights important code blocks and discovers inter-dependencies across different code forms. Additionally, an intent-aware summary generator with an intent-aware attention mechanism is introduced to produce customized summaries tailored to multiple intents. Extensive experiments, including evaluations across various architectures and optimization levels, demonstrate that MiSum outperforms state-of-the-art baselines in BLEU, METEOR, and ROUGE-L metrics. Human evaluations validate its capability to effectively support reverse engineers in understanding diverse binary code intents, marking a significant advancement in binary code analysis.
| 발행 연도 | 2025년 |
|---|---|
| 인용수 | 0 |
| 출판 국가 | China |
| 사이트 | ACM |
| 좋아요 수 | 0 |