연구 분야: Software Development
학회: International Conference on Computer Safety, Reliability, and Security
This study evaluates the security of web application code generated by Large Language Models, analyzing 2,500 GPT-4 generated PHP websites. These were deployed in Docker containers and tested for vulnerabilities using a hybrid approach of Burp Suite active scanning, static analysis, and manual review. Our investigation focuses on identifying Insecure File Upload, SQL Injection, Stored XSS, and Reflected XSS in GPT-4 generated PHP code. This analysis highlights potential security risks and the implications of deploying such code in real-world scenarios. Overall, our analysis found 2,440 vulnerable parameters. According to Burp’s Scan, 11.56% of the sites can be straight out compromised. Adding static scan results, 26% had at least one vulnerability that can be exploited through web interaction. Certain coding scenarios, like file upload functionality, are insecure 78% of the time, underscoring significant risks to software safety and security. The main contribution of our research is the creation of a methodology, that allows benchmarking LLMs for deployable PHP code generation, where the investigated property is secure code generation. We have made the source codes and a detailed vulnerability record for each GPT-4 generated sample publicly available to support further research: https://github.com/Beckyntosh/ChatPHP. This study emphasizes the crucial need for thorough testing and evaluation if generative AI technologies are used in software development.
| 발행 연도 | 2024년 |
|---|---|
| 인용수 | 0 |
| 출판 국가 | Norway |
| 사이트 | Springer |
| 좋아요 수 | 0 |