Approach to Forming Vulnerability Datasets for Fine-Tuning AI Agents


연구 분야: Strategies



학회: 2025 International Russian Smart Industry Conference (SmartIndustryCon)


초록

This study addresses the problem of identifying vulnerabilities in open-source code by exploring existing methods of code analysis and evaluating the feasibility of automating security assessments using large language models (LLMs). We propose an approach for constructing high-quality datasets to fine-tune LLMs for software security analysis. Our methodology involves collecting and processing vulnerability data, filtering and curating security-related code changes, and structuring datasets to optimize model fine-tuning. We present an algorithm for aggregating vulnerability data sources and constructing a dataset specifically for training security-focused LLMs. To validate our approach, we fine-tune models from the Qwen family for software vulnerability detection in Python codebases during development and testing. Our findings demonstrate that the proposed method enables the development of intelligent, continuously adaptable AI agents capable of identifying and analyzing emerging zero-day vulnerabilities, not only in Python but also in other structurally similar programming languages.


Author Profile
Alexander A. Zakharov

Information security department Tyumen State University Tyumen Russia

Russia
Author Profile
Kirill Gladkikh

Information security department Tyumen State University Tyumen Russia

Russia

📄 논문 정보

발행 연도 2025년
인용수 70
출판 국가 Russia
사이트 IEEE
좋아요 수 0

연관 논문 목록 (146건)