Fooling machine learning models: a novel out-of-distribution attack through generative adversarial networks


연구 분야: Artificial Intelligence



학회: Applied Intelligence


초록

Recent advancements in machine learning (ML) have facilitated the deployment of ML models across various real-world applications. However, these ML models might suffer from various potential security threats. In this paper, we propose a novel out-of-distribution attack: Leveraging pre-trained generative adversarial networks (GANs), an adversary aims to fool an ML model and make the model misclassify a sample from GANs as a pre-specified target class. Our attack is based on the insight that ML models do not know when they do not know, and ML models can unexpectedly recognize a completely different sample (e.g. cartoon face) as a certain class (e.g. airplane) with high confidence. Specifically, we introduce a targeted attack framework through GANs for white-box and black-box scenarios. Our framework casts this problem as an optimization problem and a family of attack methods are developed. Extensive experimental results show that our methods can achieve competitive performance, even compared with several state-of-the-art adversarial example attacks. Furthermore, our methods can evade several widely-used and the latest defenses. We also elaborately analyze various factors that affect the attack performance. Our work will provide a supplementary test to comprehensively evaluate the robustness of ML systems.


Author Profile
Hailong Hu

National Research Base of Intelligent Manufacturing Service Chongqing Technology and Business University Chongqing 400067 China

Andorra
Author Profile
Jun Pang

Interdisciplinary Centre for Security Reliability and Trust University of Luxembourg Esch-sur-Alzette 4365 Luxembourg

Andorra

📄 논문 정보

발행 연도 2025년
인용수 0
출판 국가 Andorra
사이트 Springer
좋아요 수 0

연관 논문 목록 (19건)