Exposing the Achilles' Heel: Black-box Reverse Engineering of Commercial LLM Plugins' Hidden Prompts


연구 분야: Analysis



학회: FSE Companion '25: Proceedings of the 33rd ACM International Conference on the Foundations of Software Engineering


초록

Large language models (LLMs) demonstrate powerful natural language semantics understanding capabilities and are widely integrated into applications. OpenAI provides a platform for developers to construct custom applications, extending ChatGPT's functions and integrating external tools. Since its release in November 2023, over 3 million custom applications have been created. However, such a vast ecosystem also conceals security and privacy threats. For developers, instruction leaking attacks threaten the intellectual property of instructions in LLM applications through carefully crafted adversarial prompts. To systematically evaluate the scope of threats in real-world LLM applications, we develop an inception prompt hijacking attack, namely IPH, target LLM applications. Our experiments on 5,000 real-world LLM applications reveal that over 95.1% of applications are vulnerable to instruction-leaking attacks via one or more adversarial prompts. Our findings raise awareness among LLM applications developers about the importance of integrating specific defensive strategies in their instructions.


Author Profile
Wenying Wei

The Hong Kong Polytechnic University Hong Kong Hong Kong

Hong Kong
Author Profile
Kaifa Zhao

The Hong Kong Polytechnic University Hong Kong Hong Kong

Hong Kong
Author Profile
Hao Zhou

The Hong Kong Polytechnic University Hong Kong Hong Kong

Hong Kong

📄 논문 정보

발행 연도 2025년
인용수 0
출판 국가 Hong Kong
사이트 ACM
좋아요 수 0

연관 논문 목록 (72건)