Adversarial perturbation denoising utilizing common characteristics in deep feature space


연구 분야: Verification



학회: Applied Intelligence


초록

Recent studies have shown that deep neural networks (DNNs) are vulnerable to adversarial examples (AEs). Denoising based on the input pre-processing is one of the defenses against adversarial attacks. However, it is hard to remove multiple adversarial perturbations, especially in the presence of evolving attacks. To address this challenge, we attempt to extract the commonality of adversarial perturbations. Due to the imperceptibility of adversarial perturbations in the input space, we conduct the extraction in the deep feature space where the perturbations become more apparent. Through the obtained common characteristics, we craft common adversarial examples (CAEs) to train the denoiser. Furthermore, to prevent image distortion while removing as much of the adversarial perturbation as possible, we propose a hybrid loss function that guides the training process at both the pixel level and the deep feature space. Our experiments show that our defense method can eliminate multiple adversarial perturbations, significantly enhancing adversarial robustness compared to previous state-of-the-art methods. Moreover, it can be plug-and-play for various classification models, which demonstrates the generalizability of our defense method.


Author Profile
Jianchang Huang

School of Science Zhejiang University of Science and Technology Hangzhou 3210023 Zhejiang China

Andorra
Author Profile
Yinyao Dai

School of Science Zhejiang University of Science and Technology Hangzhou 3210023 Zhejiang China

Andorra
Author Profile
Fang Lu

School of Science Zhejiang University of Science and Technology Hangzhou 3210023 Zhejiang China

Andorra

📄 논문 정보

발행 연도 2024년
인용수 0
출판 국가 Andorra, China
사이트 Springer
좋아요 수 0

연관 논문 목록 (78건)