Harnessing pre-trained generalist agents for software engineering tasks


연구 분야: Verification



학회: Empirical Software Engineering


초록

Nowadays, we are witnessing an increasing adoption of Artificial Intelligence (AI) to develop techniques aimed at improving the reliability, effectiveness, and overall quality of software systems. Deep reinforcement learning (DRL) has recently been successfully used for automation in complex tasks such as game testing and solving the job-shop scheduling problem, as well as learning efficient and cost-effective behaviors in various environments. However, these specialized DRL agents, trained from scratch on specific tasks, suffer from a lack of generalizability to other tasks and they need substantial time to be developed and re-trained effectively. Recently, DRL researchers have begun to develop generalist agents, able to learn a policy from various environments (often Atari game environments) and capable of achieving performance similar to or better than specialist agents in new tasks. In the Natural Language Processing or Computer Vision domain, these generalist agents are showing promising adaptation capabilities to never-before-seen tasks after a light fine-tuning phase and achieving high performance. To the best of our knowledge, no study has investigated the applicability of these generalist agents to SE tasks. This paper investigates the potential of generalist agents for solving SE tasks. Specifically, we conduct on three increasingly used SE tasks: playtesting in games (for two games), bug localization in software projects (i.e., six software projects) and the minimization of makespan in task scheduling in cloud computing (for two instances). Our results show that the generalist agents outperform the specialist agents with very little effort for fine-tuning, achieving a 20% reduction of the makespan over specialized agent performance on task-based scheduling. In the context of game testing, some generalist agent configurations find bugs 3-8% faster than the specialist agents. Finally, in the context of bug localization, generalist agents perform at least 9% better than specialist agents in terms of Mean Reciprocal Rank. Building on our analysis, we provide recommendations for researchers and practitioners looking to select generalist agents for SE tasks, to ensure that they perform effectively.


Author Profile
Paulina Stevia Nouwou Mindom

Polytechnique Montréal Québec Canada

Canada
Author Profile
Amin Nikanjam

Polytechnique Montréal Québec Canada

Canada
Author Profile
Foutse Khomh

Polytechnique Montréal Québec Canada

Canada

📄 논문 정보

발행 연도 2024년
인용수 0
출판 국가 Canada
사이트 Springer
좋아요 수 0

연관 논문 목록 (168건)