Adding extra security layer to chatGPT: machine learning based model to detect malicious cybersecurity prompts


연구 분야: Verification



학회: Cluster Computing


초록

The rapid advancement of Artificial Intelligence (AI) has positioned Large Language Models (LLMs) such as ChatGPT at the forefront of innovation, showcasing unprecedented capabilities in text understanding and generation. However, the flexibility of LLMs also introduces vulnerabilities, particularly in generating malicious content, posing significant cybersecurity threats. Despite OpenAI’s implementation of security systems to mitigate misuse, adversarial techniques like "jailbreak prompts" continue to bypass these safeguards, underscoring the need for more advanced detection systems. This paper presents a novel machine-learning model designed to detect malicious cybersecurity prompts. The study involved the creation of a comprehensive dataset representing diverse cybersecurity threats across seven domains, including malware, phishing, and social engineering. The dataset was developed through a three-phased approach: generating malicious prompts using ChatGPT, collecting benign prompts from reputable cybersecurity websites via web scraping, and crafting advanced prompts to simulate sophisticated jailbreak attacks. The resulting dataset contains 3,354 samples, offering a diverse and realistic representation of potential LLM exploitations. Two machine-learning experiments were conducted using various Natural Language Processing (NLP) techniques for binary and multiclass classification tasks. The binary classification achieved a 97% accuracy rate using a Support Vector Machine (SVM), while the multiclass classification achieved 99% accuracy with a sophisticated Voting technique. These results demonstrate the model’s efficacy in detecting malicious prompts and contribute significantly to the cybersecurity community by establishing a robust foundation for defending against LLM-based threats.


Author Profile
Ibrahim Obeidat

Department of Information Technology Faculty of Prince Al-Hussien Bin Abdullah 2 for IT The Hashemite University PO Box 330127 Zarqa 13133 Jordan

Albania
Author Profile
Rabee Alquran

Department of Information Technology Faculty of Prince Al-Hussien Bin Abdullah 2 for IT The Hashemite University PO Box 330127 Zarqa 13133 Jordan

Albania
Author Profile
Alla Mughaid

Department of Information Technology Faculty of Prince Al-Hussien Bin Abdullah 2 for IT The Hashemite University PO Box 330127 Zarqa 13133 Jordan

Albania

📄 논문 정보

발행 연도 2025년
인용수 0
출판 국가 Malaysia, Albania
사이트 Springer
좋아요 수 0

연관 논문 목록 (61건)