PIDRCMPP: Rapid Multi-Strategy Hierarchical Jailbreak Attacks on LLMs


연구 분야: Strategies



학회: International Conference on Intelligent Computing


초록

With the widespread application of LLMs in NLP tasks, their security issues have gradually become a focal point of research. Although various defense mechanisms, such as alignment techniques and content filtering, have been employed to prevent models from generating harmful content, LLMs remain vulnerable to security threats like jailbreaking attacks and prompt injection. To further explore the potential vulnerabilities of LLMs and advance adver-sarial research, we propose a novel automated jailbreaking attack method: PIDRCMPP, which combines Pre-Interference (PI), Disguise Reconstruction (DR), Conceal Manipulation (CM), and Program Penetration (PP) strategies to enhance the stealth and success rate of attacks. PI reduces the model’s sensi-tivity to dangerous inputs by stacking multiple instructions in advance. DR utilizes Word Reconstruction and Sentence Reconstruction strategies to by-pass security detection. CM utilizes parallel Simplification and Reverse Guidance to further enhance the stealth of the attack without reducing the toxicity of the prompt. PP exploits the characteristic of LLMs being less sen-sitive to harmful content during program comprehension, guiding them to generate inappropriate content when producing program outputs. The exper-imental results demonstrate that PIDRCMPP exhibits advanced attack success rates and the shortest time overhead across multiple mainstream LLMs.


Author Profile
Jinyang Wang

School of Software Henan University Kaifeng China

China
Author Profile
Shuai Zhang

College of Earth and Planetary Sciences Chengdu University of Technology Chengdu China

Andorra
Author Profile
Chang Liu

Department of Hospitality and Business Management The Technological and Higher Education Institute of Hong Kong Chai Wan China

Andorra

📄 논문 정보

발행 연도 2025년
인용수 0
출판 국가 Andorra, China
사이트 Springer
좋아요 수 0

연관 논문 목록 (300건)