연구 분야: Strategies
학회: 2025 International Conference on Electronics, AI and Computing (EAIC)
The integration of Large Language Models (LLMs) into cybersecurity operations, particularly Vulnerability Assessment and Penetration Testing (VAPT), has shown significant promise. However, there remains a scarcity of comprehensive benchmarks for evaluating LLMs in the VAPT domain, especially for small, open-source models suitable for local deployment. This paper introduces VAP-6, a novel benchmark comprising six distinct datasets designed to evaluate LLM capabilities across crucial VAPT knowledge domains: Common Vulnerabilities and Exposures (CVE) and Common Weakness Enumeration (CWE) identification, Common Vulnerability Scoring System (CVSS) prediction, scenario-based reasoning aligned with Certified Ethical Hacker (CEH) v12 and CompTIA PenTest+ PT0-002 certification exams, VAPT tools proficiency, and CVE-to-Metasploit module mapping. We introduce the VAP-6 methodology, encompassing dataset creation from authoritative sources like CVE and CWE MITRE, Exploit DB and Github with refinement through ChatGPT and manual verification. The benchmark was applied to evaluate selected open-source LLMs with parameters ranging from 2 to 3 billion (Qwen 2.5, Gemma2, Llama 3.2), employing Q4 quantization to ensure local computational efficiency via Ollama. This research establishes a standardized framework for benchmarking and comparing such LLMs, facilitating the development of more robust, private, and computationally efficient AI tools for VAPT professionals.
| 발행 연도 | 2025년 |
|---|---|
| 인용수 | 15 |
| 출판 국가 | India |
| 사이트 | IEEE |
| 좋아요 수 | 0 |