연구 분야: Artificial Intelligence
학회: Neural Computing and Applications
In multi-agent deep reinforcement learning (MADRL), agents can learn to communicate to broaden their view and understanding of the environment and their teammates. Previous works on communication in MADRL mainly rely on centralized or independent value functions for learning communication, which cannot differentiate how communicating agents individually contribute to the overall learning process. Moreover, continuous environments that incorporate continuous state/action spaces have received limited attention in previous research. In this paper, we propose a novel architecture for communicating agents and apply centralized but factorized value functions to differentiate how each agent contributes to learning during communication, along with gradient backpropagation. Additionally, to address the complexity introduced by communication, we investigate the use of an attention mechanism that aggregates messages, enabling policies to maintain a fixed input length. We then present a new policy gradient method termed communication with factorized policy gradients (CFPG), featuring full backpropagation from factorized value functions to communicating agents’ architecture. We demonstrate that CFPG can enhance performance and accelerate learning in continuous predator–prey scenarios and multi-agent MuJoCo, when compared to other learning communication methods.
| 발행 연도 | 2025년 |
|---|---|
| 인용수 | 0 |
| 출판 국가 | Andorra |
| 사이트 | Springer |
| 좋아요 수 | 0 |