연구 분야: Infrastructure
학회: World Congress in Computer Science, Computer Engineering & Applied Computing
This paper studies the feature contribution values of software code tokens in the learning task of vulnerability multi-classification in terms of Common Weakness Enumeration (CWE) types. The nuance of sibling CWEs under the same parent CWE category has the challenges of learning the correct types. Such an issue of the semantic meanings in relation to feature attention values and feature contribution values requires a systematic assessment . We devise an assessment framework that integrates the eXplainable AI (XAI) techniques and measurements to examine the importance of factors, including token length, separators, token attention values, abstract syntax tree meta constructs and their effects to learning performance. We apply three open source datasets in both Java and C++ languages, three transformer learning models, two XAI algorithms. The results highlight three clues that (1) higher attention values have more feature contribution values as the impact; (2) the attention values alone may not distinguish the subtle difference among close CWE types; and (3) increasing input token length has more impact on tokens with higher contribution values.
| 발행 연도 | 2025년 |
|---|---|
| 인용수 | 0 |
| 출판 국가 | Canada |
| 사이트 | Springer |
| 좋아요 수 | 0 |