연구 분야: Analysis
학회: 2025 IEEE International Conference on Cyber Security and Resilience (CSR)
Large Language Models (LLMs) have been widely used in software development, yet the security of AI-generated code remains a critical concern. This research examines security vulnerabilities in JavaScript code generated by six LLMs, which are ChatGPT-4o, Claude v3.5 Sonnet, DeepSeek R1 70B, Llama 3.1 405B, Mistral Large 2, and Nova Pro. We propose a new approach to assess security vulnerabilities. Using 100 identical complex prompts, we systematically assessed vulnerabilities in AI-generated code and analyzed the most frequently occurring Common Weakness Enumeration (CWE) categories in different LLMs. Our results show that 275 of the 600 generated JavaScript code snippets contain vulnerabilities. We also identify \mathbf{6 0 2} vulnerabilities from 28 CWEs. While all LLMs introduce vulnerabilities into generated code, their CWE and severity distribution vary significantly. We also introduce metrics to ensure a fair comparison of LLM capabilities, such as Vulnerabilities per Line of Code (V/LoC) and Weighted Security Risk per Line of Code (WSR/LoC) as evaluation metrics. Our research highlights the need for security measures to mitigate the security risks in AIgenerated code.
| 발행 연도 | 2025년 |
|---|---|
| 인용수 | 10 |
| 출판 국가 | Andorra |
| 사이트 | IEEE |
| 좋아요 수 | 0 |