Towards an Understanding of Deep Neural Network Resiliency to Hardware Faults


연구 분야: Verification



학회: 2025 20th European Dependable Computing Conference (EDCC)


초록

European Dependable Computing Conference - Regular Paper Abstract-Many companies are designing dedicated Deep Neural Network (DNN) accelerators with the goal of combining tens of thousands of them into supercomputers or for use in safety-critical applications such as autonomous driving. Both applications require consideration of hardware (HW) faults: At the scale of supercomputers, these DNN accelerators are prone to suffer several HW faults a day. Likewise, for car model fleet sizes in the millions, dozens of drivers will experience HW faults every day. A HW protection scheme to detect almost all faults in an execution unit comes at the cost of more than 10% silicon area, so significant performance may be gained by investigating the actual level of protection required for the intended DNN applications. To this end, various statistical experiments using fault injection have been published demonstrating DNNs to be relatively resilient to HW faults. Evidently, DNNs possess general properties causing this resiliency. In the present work, we carry out a stochastic analysis of typical DNN operations to formally identify resiliency properties. In turn, they may become best practices for dependability stakeholders and DNN designers to follow. Furthermore, the stochastic tools we develop in the process may be used for formal DNN HW fault resiliency verification.


Author Profile
Patrik Omland

Department of Informatics Technical University Munich Garching Germany

Germany
Author Profile
Michael Paulitsch

Intel Labs Intel Corporation Munich Germany

Germany
Author Profile
Gereon Hinz

Department of Informatics Technical University Munich Garching Germany

Germany

📄 논문 정보

발행 연도 2025년
인용수 5
출판 국가 Germany
사이트 IEEE
좋아요 수 0

연관 논문 목록 (45건)