ITRT(IT Research Trends)

How secure is AI-generated code: a large-scale comparison of large language models

연구 분야: Strategies

논문 키워드: #google #331 #turbo #112 #gemini

학회: Empirical Software Engineering

초록

This study compares state-of-the-art Large Language Models (LLMs) on their tendency to generate vulnerabilities when writing C programs using a neutral zero-shot prompt. Tihanyi et al. introduced the FormAI dataset at PROMISE ’23, featuring 112,000 C programs generated by GPT-3.5-turbo, with over 51.24% identified as vulnerable. We extended that research with a large-scale study involving 9 state-of-the-art models such as OpenAI’s GPT-4o-mini, Google’s Gemini Pro 1.0, TII’s 180 billion-parameter Falcon, Meta’s 13 billion-parameter Code Llama, and several other compact models. Additionally, we introduce the FormAI-v2 dataset, which comprises 331 000 compilable C programs generated by these LLMs. Each program in the dataset is labeled based on the vulnerabilities detected in its source code through formal verification, using the Efficient SMT-based Context-Bounded Model Checker (ESBMC). This technique minimizes false positives by providing a counterexample for the specific vulnerability and reduces false negatives by thoroughly completing the verification process. Our study reveals that at least 62.07% of the generated programs are vulnerable. The differences between the models are minor, as they all show similar coding errors with slight variations. Our research highlights that while LLMs offer promising capabilities for code generation, deploying their output in a production environment requires proper risk assessment and validation.

📄 논문 정보

발행 연도	2024년
인용수	0
출판 국가	Hungary, Norway, Algeria
사이트	Springer
좋아요 수	0

How secure is AI-generated code: a large-scale comparison of large language models

How secure is AI-generated code: a large-scale comparison of large language models

Norbert Tihanyi

Tamas Bisztray

Mohamed Amine Ferrag

Ridhi Jain

Lucas C. Cordeiro

📄 논문 정보

연관 논문 목록 (6건)

How secure is AI-generated code: a large-scale comparison of large language models

How secure is AI-generated code: a large-scale comparison of large language models

📄 논문 정보

연관 논문 목록 (6건) 내 서재 담기

연관 논문 목록 (6건)