Highliner: Enhancing Binary Analysis through NLP-Based Instruction-Level Detection of C++ Inline Functions


연구 분야: Analysis



학회: ACM Transactions on Privacy and Security


초록

The complexities introduced by compiler optimization have long stood as a significant obstacle in binary analysis and reverse engineering. Function inlining, in particular, complicates function recognition by replacing function calls with the entire body of the callee, mixing code from multiple functions. State-of-the-art approaches can identify inlined functions at basic block granularity, but cannot determine which instructions belong to each function and precisely deduce inlined boundaries. Without this information, further analyses such as decompilation cannot be performed effectively. This paper presents Highliner, a novel approach that improves state-of-the-art approaches by identifying inline instances at instruction-level granularity. Highliner operates downstream of block-level detectors: given basic blocks reported by state-of-the-art approaches as belonging to a specific inlined function, it labels each instruction as Inlined or Not inlined and recovers the inlined-function boundaries. We treat the problem as a sequence tagging task typical of NLP and implement a learning-based technique involving instruction embedding and recurrent neural networks. We compile a dataset of open-source projects with different optimizations and use the DWARF debug information standard to construct labeled sequences of inline instructions. We use this dataset to train, validate, and test a sequence labeling architecture in which instructions are encoded via the pre-trained assembly language transformer PalmTree and then processed by an RNN-based classifier to produce binary predictions. When evaluated as a binary classifier, Highliner achieves an F1-score of 0.94 overall. In addition, when specifically tested on recognizing function boundaries, Highliner achieves an Accuracy of 0.82 on initial boundaries and 0.83 on final boundaries.


Author Profile
Lorenzo Dall'Aglio

Politecnico di Milano Milan Italy

Italy
Author Profile
Lorenzo Binosi

Politecnico di Milano Milan Italy

Italy
Author Profile
Michele Carminati

DIPARTIMENTO DI ELETTRONICA INFORMAZIONE E BIOINGENGERIA POLITECNICO DI MILANO Milan Italy

Italy

📄 논문 정보

발행 연도 2025년
인용수 0
출판 국가 Italy
사이트 ACM
좋아요 수 0

연관 논문 목록 (163건)