연구 분야: Artificial Intelligence
학회: Neural Computing and Applications
Deep neural networks (DNNs) are crucial in autonomous driving systems (ADSs) for tasks like steering control, but model inaccuracies, biased training data, and incorrect runtime parameters can compromise their reliability. Metamorphic testing (MT) enhances reliability by generating follow-up tests from mutated DNN source inputs, identifying inconsistencies as defects. Various MT techniques for ADSs include generative/transfer models, neuron-based coverage maximization, and adaptive test selection. Despite these efforts, significant challenges remain, including the ambiguity of neuron coverage’s correlation with misbehaviour detection, a lack of focus on DNN critical pathways, inadequate use of search-based methods, and the absence of an integrated method that effectively selects sources and generates follow-ups. This paper addresses such challenges by introducing DeepDomain, a grey-box multi-objective test generation approach for DNN models. It involves adaptively selecting diverse source inputs and generating domain-oriented follow-up tests. Such follow-ups explore critical pathways, extracted by neuron contribution, with broader coverage compared to their source tests (inter-behavioural domain) and attaining high neural boundary coverage of the misbehaviour regions detected in previous follow-ups (intra-behavioural domain). An empirical evaluation of the proposed approach on three DNN models used in the Udacity self-driving car challenge, and 18 different MRs demonstrates that relying on behavioural domain adequacy is a more reliable indicator than coverage criteria for effectively guiding the testing of DNNs. Additionally, DeepDomain significantly outperforms selected baselines in misbehaviour detection by up to 94 times, fault-revealing capability by up to 79%, output diversity by 71%, corner-case detection by up to 187 times, identification of robustness subdomains of MRs by up to 33 percentage points, and naturalness by two times. The results confirm that state-of-the-art coverage metrics are inadequate in misbehaviour-inducing test generation. Furthermore, black-box diversity-based test generation is less effective than the grey-box approach.
| 발행 연도 | 2025년 |
|---|---|
| 인용수 | 0 |
| 출판 국가 | Andorra |
| 사이트 | Springer |
| 좋아요 수 | 0 |