Key-based data augmentation with curriculum learning for few-shot code search


연구 분야: Databases



학회: Neural Computing and Applications


초록

Given a natural language query, code search aims to find matching code snippets from a codebase. Recent works are mainly designed for mainstream programming languages with large amounts of training data. However, code search is also needed for domain-specific programming languages, which have fewer training data, and it is a heavy burden to label a large amount of training data for each domain-specific language. To this end, we propose DAFCS, a data augmentation framework with curriculum learning for few-shot code search tasks. Specifically, we first collect unlabeled codes in the same programming language as the original codes, which can provide additional semantic signals to the original codes. Second, we employ an occlusion-based method to identify key statements in code fragments. Third, we design a set of new key-based augmentation operations for the original codes. Finally, we use curriculum learning to reasonably schedule augmented samples for training well-performing models. We conduct retrieval experiments on a public dataset and find that DAFCS surpasses state-of-the-art methods by 5.42% and 5.05% in the Solidity and SQL domain-specific languages, respectively. Our study shows that DAFCS, which adopts data augmentation and curriculum learning strategies, can achieve promising performance in few-shot code search tasks.


Author Profile
Fan Zhang

College of Computer Science and Electronic Engineering Hunan University Lushan South Road Changsha 410082 Hunan Province China

Andorra
Author Profile
Manman Peng

Hunan Provincial Key Laboratory of Blockchain Infrastructure and Application Hunan University Lushan South Road Changsha 410082 Hunan Province China

Andorra
Author Profile
Qiang Wu

College of Computer Science and Electronic Engineering Hunan University Lushan South Road Changsha 410082 Hunan Province China

Andorra

📄 논문 정보

발행 연도 2024년
인용수 0
출판 국가 Andorra
사이트 Springer
좋아요 수 0

연관 논문 목록 (256건)