연구 분야: Artificial Intelligence
학회: 2023 China Automation Congress (CAC)
Reinforcement learning (RL) methods that train agents using simulation are well suited to solve behavioral decision-making problems. However, complex simulation platforms that have slow processing speed make RL time-consuming. It is therefore necessary to make full use of expert experience and historical simulation data to avoid training from scratch each time. Considering that the simulation data based on the expert experience are valuable, this paper proposes a new algorithm, which is derived from the behavioral cloning (BC) method, to generate the appropriate model for further RL. The proposed TD-BC algorithm is specifically designed to train policy network and value network simultaneously by using expert experience. We update the policy network by training the model output to be as consistent as possible with the given action of the expert. Then the difference between the value network output of the next moment state and the current moment state is used as the TD error to update the value network. Finally, the subsequent training tasks can be completed through simple fine-tuning with reducing the accumulation time of online learning data and improving the efficiency of the entire training process. The effectiveness of the proposed TD-BC algorithm is validated through the cases with single agent and multiple agents, respectively. In the simulation, we use behavior trees derived from expert experiences to generate historical data. The results show that the TD-BC algorithm can learn expert experience, which provides a high starting point for training and thus accelerate the process of RL.
| 발행 연도 | 2023년 |
|---|---|
| 인용수 | 1 |
| 출판 국가 | China |
| 사이트 | IEEE |
| 좋아요 수 | 0 |