ITRT(IT Research Trends)

Adaptive Web Crawling for Threat Intelligence Using a Reinforcement Learning-Enhanced Large Language Model

연구 분야: Safety

논문 키워드: #cyber #cybersecurity #internet #websites #crawling

학회: International Conference on Cyberspace Simulation and Evaluation

초록

In the digital age, cyber threats are evolving rapidly, becoming increasingly complex and widespread, which necessitates robust systems for dynamically gathering threat intelligence. Traditional methods of threat intelligence collection often rely on rule-based static systems that fail to extract domain-specific data from a comprehensive web perspective. While web crawlers can automatically browse the internet to collect relevant data and have been employed for threat intelligence gathering, these systems are often limited by their inability to adapt to new or unforeseen web structures and content. Moreover, they lack the ability to automatically explore links and dynamically decide when to continue crawling or stop, which further limits their effectiveness. To address these limitations, this paper proposes an innovative approach to developing an open-source, large-scale threat intelligence extraction system using reinforcement learning (RL). The proposed system, Intelligent Threat Intelligence Extraction System (ITIES), utilizes RL to dynamically adjust its web crawling strategy, allowing it to accurately extract relevant information such as IP addresses and domains from various websites. Unlike traditional web crawlers that rely on static, rule-based methods, ITIES adapts to the constantly changing web environment by learning from interactions with different webpages. By integrating Q-learning with the powerful large language model capabilities of ScrapeGraphAI, the system effectively balances exploration of new links and exploitation of known valuable sources, optimizing its crawling path to maximize the extraction of actionable intelligence. Experimental results demonstrate that ITIES significantly enhances the efficiency and accuracy of threat intelligence extraction, with \(F_1\) score improvements of up to +4.30%. This research not only showcases the potential of RL in adaptive web crawling but also contributes to the field of cybersecurity by providing an intelligent, automated solution for threat intelligence gathering.

📄 논문 정보

발행 연도	2025년
인용수	0
출판 국가	China
사이트	Springer
좋아요 수	0

Adaptive Web Crawling for Threat Intelligence Using a Reinforcement Learning-Enhanced Large Language Model

Adaptive Web Crawling for Threat Intelligence Using a Reinforcement Learning-Enhanced Large Language Model

Xiayu Xiang

Zhaoquan Gu

Huchen Zhou

Ke Zhou

📄 논문 정보

연관 논문 목록 (594건)

Adaptive Web Crawling for Threat Intelligence Using a Reinforcement Learning-Enhanced Large Language Model

Adaptive Web Crawling for Threat Intelligence Using a Reinforcement Learning-Enhanced Large Language Model

📄 논문 정보

연관 논문 목록 (594건) 내 서재 담기

연관 논문 목록 (594건)