An extensive study of the effects of different deep learning models on code vulnerability detection in Python code


연구 분야: Infrastructure



학회: Automated Software Engineering


초록

Deep learning has achieved great progress in automated code vulnerability detection. Several code vulnerability detection approaches based on deep learning have been proposed. However, few studies empirically studied the impacts of different deep learning models on code vulnerability detection in Python. For this reason, we strive to cover many more code representation learning models and classification models for vulnerability detection. We design and conduct an empirical study for evaluating the effects of the eighteen deep learning architectures derived from combinations of three representation learning models, i.e., Word2Vec, fastText, and CodeBERT, and six classification models, i.e., random forest, XGBoost, Multi-Layer Perception (MLP), Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Gate Recurrent Unit (GRU) on code vulnerability detection in total. Additionally, two machine learning strategies i.e., the attention and bi-directional mechanisms are also empirically compared. The statistical significance and effect size analysis between different models are also conducted. In terms of precision, recall, and F-score, Word2Vec is better than Bidirectional Encoder Representations from Transformers CodeBERT and fastText. Likewise, long short-term memory (LSTM) and gated recurrent unit (GRU) are superior to other classification models we studied. The bi-directional LSTM and GRU with attention using Word2Vec are two optimal models for solving code vulnerability detection for Python code. Moreover, they have medium or large effect sizes on LSTM and GRU using only a single mechanism. Both the representation learning models and classification models have important influences on vulnerability detection in Python code. Likewise, the bi-directional and attention mechanisms can impact the performance of code vulnerability detection.


Author Profile
Rongcun Wang

School of Computer Science and Technology China University of Mining and Technology No. 1 Daxue Road Xuzhou 221116 Jiangsu China

Andorra
Author Profile
Senlei Xu

School of Computer Science and Technology China University of Mining and Technology No. 1 Daxue Road Xuzhou 221116 Jiangsu China

Andorra
Author Profile
Xingyu Ji

School of Computer Science and Technology China University of Mining and Technology No. 1 Daxue Road Xuzhou 221116 Jiangsu China

Andorra

📄 논문 정보

발행 연도 2024년
인용수 0
출판 국가 Andorra, Canada
사이트 Springer
좋아요 수 0

연관 논문 목록 (63건)