연구 분야: Artificial Intelligence
학회: Applied Intelligence
Recent advancements in machine learning (ML) have facilitated the deployment of ML models across various real-world applications. However, these ML models might suffer from various potential security threats. In this paper, we propose a novel out-of-distribution attack: Leveraging pre-trained generative adversarial networks (GANs), an adversary aims to fool an ML model and make the model misclassify a sample from GANs as a pre-specified target class. Our attack is based on the insight that ML models do not know when they do not know, and ML models can unexpectedly recognize a completely different sample (e.g. cartoon face) as a certain class (e.g. airplane) with high confidence. Specifically, we introduce a targeted attack framework through GANs for white-box and black-box scenarios. Our framework casts this problem as an optimization problem and a family of attack methods are developed. Extensive experimental results show that our methods can achieve competitive performance, even compared with several state-of-the-art adversarial example attacks. Furthermore, our methods can evade several widely-used and the latest defenses. We also elaborately analyze various factors that affect the attack performance. Our work will provide a supplementary test to comprehensively evaluate the robustness of ML systems.
| 발행 연도 | 2025년 |
|---|---|
| 인용수 | 0 |
| 출판 국가 | Andorra |
| 사이트 | Springer |
| 좋아요 수 | 0 |