연구 분야: Artificial Intelligence
학회: AAMAS '25: Proceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems
Simulated Annealing (SA) is a stochastic optimization algorithm widely employed to approximate the global optimum of an energy function in both discrete and continuous problem domains. As an extension of conventional gradient descent methods, SA probabilistically accepts worse solutions to escape local optima, thereby enhancing the exploration of the solution space. SA's performance is highly contingent upon specific components, notably the neighbor proposal distribution and the temperature annealing schedule. Recent advancements such as Neural SA have improved upon traditional SA by adopting a reinforcement learning perspective, interpreting the neighbor proposal distribution as a learnable policy. Neural SA outperforms vanilla SA algorithms across various combinatorial optimization benchmarks and exhibits scalability and computational efficiency for larger problems. However, its performance remains inferior to standard commercial solvers, and it is not very generalizable across continuous problems. In this work, we introduce Reinforcement Learning Based Simulated Annealing (RL Based SA), a significant enhancement over Neural SA in terms of performance and generalizability. RL Based SA modifies the state parameters to include the change in energy from SA. It also replaces the multilayer perceptron neural networks trained using proximal policy optimization (PPO) with long short-term memory (LSTM) neural networks. This substitution enables the processing of time-series inputs of variable lengths, allowing the utilization of the entire SA rollout as input. We demonstrate that RL Based SA achieves superior results over Neural SA, vanilla SA, and adaptive SA, while attaining performance comparable to standard solvers in terms of solution quality and runtime across a spectrum of discrete and continuous problems. The benchmarks evaluated include the Knapsack, Bin Packing, and Traveling Salesperson problems, as well as continuous optimization functions such as Rosenbrock, Ackley, and Eggholder functions, and we presented training and convergence time comparisons on each function to highlight the computational trade-offs of our approach. Additionally, we show that RL Based SA is generalizable across different continuous problems, robustly scalable with respect to problem size, and computationally efficient.
| 발행 연도 | 2025년 |
|---|---|
| 인용수 | 0 |
| 출판 국가 | United States |
| 사이트 | ACM |
| 좋아요 수 | 0 |