연구 분야: Verification
학회: Cluster Computing
The rapid advancement of Artificial Intelligence (AI) has positioned Large Language Models (LLMs) such as ChatGPT at the forefront of innovation, showcasing unprecedented capabilities in text understanding and generation. However, the flexibility of LLMs also introduces vulnerabilities, particularly in generating malicious content, posing significant cybersecurity threats. Despite OpenAI’s implementation of security systems to mitigate misuse, adversarial techniques like "jailbreak prompts" continue to bypass these safeguards, underscoring the need for more advanced detection systems. This paper presents a novel machine-learning model designed to detect malicious cybersecurity prompts. The study involved the creation of a comprehensive dataset representing diverse cybersecurity threats across seven domains, including malware, phishing, and social engineering. The dataset was developed through a three-phased approach: generating malicious prompts using ChatGPT, collecting benign prompts from reputable cybersecurity websites via web scraping, and crafting advanced prompts to simulate sophisticated jailbreak attacks. The resulting dataset contains 3,354 samples, offering a diverse and realistic representation of potential LLM exploitations. Two machine-learning experiments were conducted using various Natural Language Processing (NLP) techniques for binary and multiclass classification tasks. The binary classification achieved a 97% accuracy rate using a Support Vector Machine (SVM), while the multiclass classification achieved 99% accuracy with a sophisticated Voting technique. These results demonstrate the model’s efficacy in detecting malicious prompts and contribute significantly to the cybersecurity community by establishing a robust foundation for defending against LLM-based threats.
| 발행 연도 | 2025년 |
|---|---|
| 인용수 | 0 |
| 출판 국가 | Malaysia, Albania |
| 사이트 | Springer |
| 좋아요 수 | 0 |