Unlocking the Power of Machine Learning in Cybersecurity Forensics: Identifying Malicious Files


연구 분야: Safety



학회: World Congress in Computer Science, Computer Engineering & Applied Computing


초록

Our research introduces a novel method for determining the originating software of digital images, which significantly advances digital forensic analysis capabilities. This method involves transforming images into their hexadecimal code representations, thereby stripping away metadata and making the files unrecognizable by conventional identification techniques. Through a meticulous analysis of these hex codes, broken down into 2-character substrings, we construct detailed feature vectors representing the frequency of each substring. Utilizing a diverse array of machine learning models, including RandomForestClassifier, LogisticRegression, and others, our approach successfully identifies the software used to create the images, such as PowerPoint, GIMP, Picasa, and the online tool Batchtools.pro, with an impressive accuracy rate between 97% and 100%. Moreover, this technique enables the detection and flagging of files containing malicious content with nearly perfect accuracy. Our approach not only enhances the understanding of a file’s digital lineage but also offers a new mechanism in digital forensics, providing a robust tool for both identifying the software used in file creation and detecting malicious alterations.


Author Profile
Lei Chen

Department of Information Technology Georgia Southern University Statesboro GA USA

Gabon
Author Profile
Cemil Emre Yavas

Department of Information Technology Georgia Southern University Statesboro GA USA

Gabon
Author Profile
Jiban Krishna Das

Department of Information Technology Georgia Southern University Statesboro GA USA

Gabon

📄 논문 정보

발행 연도 2025년
인용수 0
출판 국가 Gabon
사이트 Springer
좋아요 수 0

연관 논문 목록 (203건)