Obfuscation for Deep Neural Networks Against Model Extraction: Attack Taxonomy and Defense Optimization


연구 분야: Analysis



학회: International Conference on Applied Cryptography and Network Security


초록

Well-trained deep neural networks (DNN), including large language models (LLM), are valuable intellectual property assets. To defend against model extraction attacks, one of the major ideas proposed in a large body of previous research is obfuscation: splitting the original DNN and storing the components separately. However, systematically analyzing the methods’ security against various attacks and optimizing the efficiency of defenses are still challenging. In this paper, We propose a taxonomy of model-based extraction attacks, which enables us to identify vulnerabilities of several existing obfuscation methods. We also propose an extremely efficient model obfuscation method called using trusted execution environment (TEE). The secrets we store in TEE have -size, i.e., independent of model size. Although relies on a pseudo-random function to provide a quantifiable guarantee for protection and noise compression, it does not need any complicated training or filtering of the weights. Our comprehensive experiments show that can mitigate norm-clipping and fine-tuning attacks. Even for small noise ( ), the accuracy of the obfuscated model is close to random guess, and the tested attacks cannot extract a model with comparable accuracy. In addition, the empirical results also shed light on discovering the relation between DP parameters in obfuscation and the risks of concrete extraction attacks.


Author Profile
Yulian Sun

Data Protection Technology Lab Huawei Technologies Düsseldorf Düsseldorf Germany

Germany
Author Profile
Vedant Bonde

Ruhr University Bochum Bochum Germany

Germany
Author Profile
Li Duan

Data Protection Technology Lab Huawei Technologies Düsseldorf Düsseldorf Germany

Germany

📄 논문 정보

발행 연도 2025년
인용수 0
출판 국가 Germany
사이트 Springer
좋아요 수 0

연관 논문 목록 (71건)