ITRT(IT Research Trends)

PIDRCMPP: Rapid Multi-Strategy Hierarchical Jailbreak Attacks on LLMs

연구 분야: Strategies

논문 키워드: #reconstruction #security #threats #jailbreaking #stealth

학회: International Conference on Intelligent Computing

초록

With the widespread application of LLMs in NLP tasks, their security issues have gradually become a focal point of research. Although various defense mechanisms, such as alignment techniques and content filtering, have been employed to prevent models from generating harmful content, LLMs remain vulnerable to security threats like jailbreaking attacks and prompt injection. To further explore the potential vulnerabilities of LLMs and advance adver-sarial research, we propose a novel automated jailbreaking attack method: PIDRCMPP, which combines Pre-Interference (PI), Disguise Reconstruction (DR), Conceal Manipulation (CM), and Program Penetration (PP) strategies to enhance the stealth and success rate of attacks. PI reduces the model’s sensi-tivity to dangerous inputs by stacking multiple instructions in advance. DR utilizes Word Reconstruction and Sentence Reconstruction strategies to by-pass security detection. CM utilizes parallel Simplification and Reverse Guidance to further enhance the stealth of the attack without reducing the toxicity of the prompt. PP exploits the characteristic of LLMs being less sen-sitive to harmful content during program comprehension, guiding them to generate inappropriate content when producing program outputs. The exper-imental results demonstrate that PIDRCMPP exhibits advanced attack success rates and the shortest time overhead across multiple mainstream LLMs.

📄 논문 정보

발행 연도	2025년
인용수	0
출판 국가	Andorra, China
사이트	Springer
좋아요 수	0

PIDRCMPP: Rapid Multi-Strategy Hierarchical Jailbreak Attacks on LLMs

PIDRCMPP: Rapid Multi-Strategy Hierarchical Jailbreak Attacks on LLMs

Jinyang Wang

Shuai Zhang

Chang Liu

Tiehan Cui

Yanxu Mao

Datao You

📄 논문 정보

연관 논문 목록 (300건)

PIDRCMPP: Rapid Multi-Strategy Hierarchical Jailbreak Attacks on LLMs

PIDRCMPP: Rapid Multi-Strategy Hierarchical Jailbreak Attacks on LLMs

📄 논문 정보

연관 논문 목록 (300건) 내 서재 담기

연관 논문 목록 (300건)