Combining deep reinforcement learning with heuristics to solve the traveling salesman problem

被引：0

作者：

Hong, Li ^{[1
]}

Liu, Yu ^{[1
]}

Xu, Mengqiao ^{[2
]}

Deng, Wenhui ^{[2
]}

机构：

[1] Dalian Univ Technol, Sch Software Technol, Dalian1 16620, Peoples R China

[2] Dalian Univ Technol, Sch Econ & Management, Dalian 116024, Peoples R China

来源：

CHINESE PHYSICS B | 2025年 / 34卷 / 01期

基金：

中国国家自然科学基金;

关键词：

traveling salesman problem; deep reinforcement learning; simulated annealing algorithm; transformer model; whale optimization algorithm; 87.55.kd; 87.55.de; 07.05.Mh; ALGORITHM;

D O I：

10.1088/1674-1056/ad95f1

中图分类号：

O4 [物理学];

学科分类号：

0702 ;

摘要：

Recent studies employing deep learning to solve the traveling salesman problem (TSP) have mainly focused on learning construction heuristics. Such methods can improve TSP solutions, but still depend on additional programs. However, methods that focus on learning improvement heuristics to iteratively refine solutions remain insufficient. Traditional improvement heuristics are guided by a manually designed search strategy and may only achieve limited improvements. This paper proposes a novel framework for learning improvement heuristics, which automatically discovers better improvement policies for heuristics to iteratively solve the TSP. Our framework first designs a new architecture based on a transformer model to make the policy network parameterized, which introduces an action-dropout layer to prevent action selection from overfitting. It then proposes a deep reinforcement learning approach integrating a simulated annealing mechanism (named RL-SA) to learn the pairwise selected policy, aiming to improve the 2-opt algorithm's performance. The RL-SA leverages the whale optimization algorithm to generate initial solutions for better sampling efficiency and uses the Gaussian perturbation strategy to tackle the sparse reward problem of reinforcement learning. The experiment results show that the proposed approach is significantly superior to the state-of-the-art learning-based methods, and further reduces the gap between learning-based methods and highly optimized solvers in the benchmark datasets. Moreover, our pre-trained model M can be applied to guide the SA algorithm (named M-SA (ours)), which performs better than existing deep models in small-, medium-, and large-scale TSPLIB datasets. Additionally, the M-SA (ours) achieves excellent generalization performance in a real-world dataset on global liner shipping routes, with the optimization percentages in distance reduction ranging from 3.52% to 17.99%.

引用

页数：11

共 50 条

[41] Application of the agamogenetic algorithm to solve the traveling salesman problem
Zhang, Yinghui
Wang, Zhiwei
Zeng, Qinghua
Yang, Haolei
Wang, Zhihua
BIO-INSPIRED COMPUTATIONAL INTELLIGENCE AND APPLICATIONS, 2007, 4688 : 135 - 143
[42] GEOMETRIC APPROACHES TO SOLVE THE CHEBYSHEV TRAVELING SALESMAN PROBLEM
BOZER, YA
SCHORN, EC
SHARP, GP
IIE TRANSACTIONS, 1990, 22 (03) : 238 - 254
[43] Artificial Fish Algorithm to solve Traveling Salesman Problem
Wang, Jian-Ping
Liu, Yan-Pei
Huang, Yong
FRONTIERS OF MANUFACTURING AND DESIGN SCIENCE II, PTS 1-6, 2012, 121-126 : 4410 - 4414
[44] Hard to solve instances of the Euclidean Traveling Salesman Problem
Stefan Hougardy
Xianghui Zhong
Mathematical Programming Computation, 2021, 13 : 51 - 74
[45] Comparison of Heuristics for Resolving the Traveling Salesman Problem with Information Technology
Gong, Wei
Li, Mei
ADVANCED RESEARCH ON MATERIAL SCIENCE, ENVIROMENT SCIENCE AND COMPUTER SCIENCE III, 2014, 886 : 593 - +
[46] G-DGANet: Gated deep graph attention network with reinforcement learning for solving traveling salesman problem
Fellek, Getu
Farid, Ahmed
Fujimura, Shigeru
Yoshie, Osamu
Gebreyesus, Goytom
Neurocomputing, 2024, 579
[47] G-DGANet: Gated deep graph attention network with reinforcement learning for solving traveling salesman problem
Fellek, Getu
Farid, Ahmed
Fujimura, Shigeru
Yoshie, Osamu
Gebreyesus, Goytom
NEUROCOMPUTING, 2024, 579
[48] The Dynamic Traveling Salesman Problem with Time-Dependent and Stochastic travel times: A deep reinforcement learning approach
Chen, Dawei
Imdahl, Christina
Lai, David
Van Woensel, Tom
TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2025, 172
[49] NeuroLKH: Combining Deep Learning Model with Lin-Kernighan-Helsgaun Heuristic for Solving the Traveling Salesman Problem
Xin, Liang
Song, Wen
Cao, Zhiguang
Zhang, Jie
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[50] Modified Local Search Heuristics for the Symmetric Traveling Salesman Problem
Misevicius, Alfonsas
Blazinskas, Andrius
Lenkevicius, Antanas
INFORMATION TECHNOLOGY AND CONTROL, 2013, 42 (03): : 217 - 230

← 1 2 3 4 5 →