When CodeBERT Embraces Code Embedding: An Empirical SW Reliability Folktale


연구 분야: Verification



학회: 2024 IEEE 21st India Council International Conference (INDICON)


초록

Software(SW) development is crucial for simulation and automation, driving technological innovation, and economic growth. However, a significant portion of SW contains faults that emerge over time, raising concerns about reliability, as failures can lead to financial losses and safety risks. In this paper, we propose a novel approach leveraging CodeBERT and code embedding schemes to design SW fault prediction models in question, by utilizing a vast array of machine learning (ML) models as well as addressing important class balancing issues using SMOTE. We conducted a comparative analysis of original sampling versus the SMOTE-based class-balanced approach across four key performance metrics: Accuracy, AUC, F-mean, and G-mean. The results show that SMOTE consistently improves average values, highlighting its significant positive impact on class balancing and reducing fault prediction bias. The study proposes a novel cost-analysis framework to evaluate trade-offs between predictive performance and testing efficiency. Empirical analysis reveals that logistic regression and ensemble classifiers like Extra Trees consistently achieve superior performance across key metrics. The cost-analysis framework results demonstrate that according to the engineer, for low, medium, and high testing efficiency, the fault prediction model is better suited for projects with faulty classes below the threshold values of 53.52 %, 40.98 %, and 28.00 %, respectively, revealing the trade-offs between the predictive capacity and the implicit cost factors.


Author Profile
Lov Kumar

National Institute of Technology Kurukshetra India

India
Author Profile
Vikram Singh

National Institute of Technology Kurukshetra India

India
Author Profile
Pratyush Mishra

National Institute of Technology Kurukshetra India

India

📄 논문 정보

발행 연도 2024년
인용수 24
출판 국가 India
사이트 IEEE
좋아요 수 0

연관 논문 목록 (52건)