Comparative analysis of soft-error sensitivity in LU decomposition algorithms on diverse GPUs


연구 분야: Analysis



학회: The Journal of Supercomputing


초록

Graphics processing units (GPUs) have become integral to embedded systems and supercomputing centres due to their large memory, cutting-edge technology and high performance per watt. However, their susceptibility to transient errors requires a comprehensive analysis of error sensitivity, as well as the development of error mitigation techniques and fault-tolerant algorithms. This study focuses on evaluating the soft-error sensitivity of two distinct versions of LU decomposition algorithms implemented on two very different GPUs—a low-power SoC embedded GPU and a high-performance massively parallel GPU. Through extensive fault injection campaigns on both GPUs, we examine the vulnerability of the algorithms, identify error causes, and determine critical code components requiring enhanced protection. The experiments reveal that most single bit flip fault injections in the instruction results lead to erroneous outcomes or unrecoverable errors. Notably, efficient GPU resource utilisation can increase the number of masked errors, thereby enhancing error resilience. Additionally, while different parts of the code exhibit similar error occurrence types and rates, the propagation of errors to elements within the result matrix differs significantly.


Author Profile
German Leon

Depto. de Ingeniería y Ciencia de Computadores Universitat Jaume I de Castelló Avda. Sos Baynat s/n 12071 Castellón Spain

Germany
Author Profile
Jose M. Badia

Depto. de Ingeniería y Ciencia de Computadores Universitat Jaume I de Castelló Avda. Sos Baynat s/n 12071 Castellón Spain

Germany
Author Profile
Jose A. Belloch

Depto. de Tecnología Electrónica Universidad Carlos III de Madrid Avda Universidad 30 28911 Leganés Madrid Spain

Germany

📄 논문 정보

발행 연도 2024년
인용수 0
출판 국가 Germany
사이트 Springer
좋아요 수 0

연관 논문 목록 (91건)