Binary Representation Embedding and Deep Learning For Binary Code Similarity Detection in Software Security Domain


연구 분야: Safety



학회: SOICT '23: Proceedings of the 12th International Symposium on Information and Communication Technology


초록

Binary Code Similarity Detection (BCSD) is the process of analyzing the binary representations of two functions, programs, or related entities to generate a quantitative output that signifies the similarity score between them. This task encompasses a wide range of applications, including addressing the binary search problem, which involves searching for code segments within a binary file that are similar to a specified binary code segment. These capabilities open up numerous potential applications within the domain of binary code analysis such as software vulnerability detection, clone detection, and malware analysis. In this paper, we introduce BiSim-Inspector, a BCSD tool based on Deep Learning (DL). This tool leverages the Bytes2vec method, which we develop to transform the bytecode of binary functions into vectors, which are then fed into the Convolutional Neural Network - Gated Recurrent Unit (CNN-GRU) model. Additionally, we conducted a series of experiments to assess the effectiveness of our method by comparing it with existing state-of-the-art (SOTA) tools. We use a large-scale, well-structured, and diversified dataset, BinaryCorp, for the task of BCSD. The results show that our framework achieves a Recall rate of 89%, which is 25% higher than existing SOTA methods, without compromising the training and prediction time.


Author Profile
Thinh Nguyen Hung

University of Information Technology Vietnam National University Viet Nam

Namibia
Author Profile
Hai Nguyen Phuc

University of Information Technology Vietnam National University Viet Nam

Namibia
Author Profile
Khoa Tran Dinh

University of Information Technology Vietnam National University Viet Nam

Namibia

📄 논문 정보

발행 연도 2023년
인용수 0
출판 국가 Namibia
사이트 ACM
좋아요 수 0

연관 논문 목록 (144건)