Identifying Open-Source Threat Detection Resources on GitHub: A Scalable Machine Learning Approach


연구 분야: Safety



학회: International Journal of Information Security


초록

Many businesses rely on open-source software modules to build their technology stacks. However, those who lack domain expertise may struggle to find the right software due to unfamiliar terminology and specific names. As a consequence, search engines and other platforms often cannot be utilized effectively to discover appropriate solutions. There is thus a need for a more applicable approach to assist non-domain experts in navigating the vastness of available repositories, enabling them to efficiently discover and select the right solution for their business needs. To overcome these gaps, we introduce an approach that supports finding unpopular yet important open-source software repositories on GitHub using advanced machine learning techniques. For this purpose, we propose novel strategies for information gathering and data pre-processing that resolve scalability issues of existing solutions and enable clustering of repositories even when topics, descriptions, or repository names are unclear or absent. For our evaluation, we gathered a dataset of 221,971 repositories using GitHub search and keywords related to incident detection. We show that our approach is able to separate threat detection repositories from others with an F1-score of 0.93.


Author Profile
Manuel Kern

AIT Austrian Institute of Technology Giefinggasse 4 1210 Vienna Austria

Austria
Author Profile
Max Landauer

AIT Austrian Institute of Technology Giefinggasse 4 1210 Vienna Austria

Austria
Author Profile
Florian Skopik

AIT Austrian Institute of Technology Giefinggasse 4 1210 Vienna Austria

Austria

📄 논문 정보

발행 연도 2025년
인용수 0
출판 국가 Austria
사이트 Springer
좋아요 수 0

연관 논문 목록 (65건)