Learning to multi-vehicle cooperative bin packing problem via sequence-to-sequence policy network with deep reinforcement learning model

被引：7

作者：

Tian, Ran ^{[1
]}

Kang, Chunming ^{[1
]}

Bi, Jiaming ^{[1
]}

Ma, Zhongyu ^{[1
]}

Liu, Yanxing ^{[1
]}

Yang, Saisai ^{[1
]}

Li, Fangfang ^{[1
]}

机构：

[1] Northwest Normal Univ, Dept Coll Comp Sci & Engn, Lanzhou 730070, Peoples R China

来源：

COMPUTERS & INDUSTRIAL ENGINEERING | 2023年 / 177卷

基金：

中国国家自然科学基金;

关键词：

Deep Reinforcement Learning; 3D Bin Packing Policy; Position Sequence; Logistics Packing; SEARCH ALGORITHM; LOCAL SEARCH; SUPPLY CHAIN; OPTIMIZATION;

D O I：

10.1016/j.cie.2023.108998

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

In the logistics bin packing scenario with only rear bin doors, the packing sequence of items determines the utilization of vehicle packing space, but there is relatively little research on optimizing the packing sequence of items. Therefore, this article focuses on the bin packing sequence problem in the multi-vehicle cooperative bin packing problem(MVCBPP) and proposes a deep reinforcement learning model based on the sequence-to -sequence policy network with deep reinforcement learning model(S2SDRL). Firstly, the sequence-to-sequence neural networks model is constructed, which determines the packing probability of all items. The items will be packed by combining the bidirectional LSTM model and the attention module to construct the encoder and decoder. Secondly, the bin packing strategy of the items is obtained by the constructed reinforcement learning packing framework. Finally, the Seq2Seq policy network is updated and optimized by the policy gradient method with a baseline to obtain the current optimal packing strategy. In several bin packing scenarios, S2SDRL im-proves the average vehicle space utilization by more than 4.0% compared with the traditional packing algorithm, and the forward computation time of the model is much smaller than that of the traditional heuristic algorithm, so the model also has more realistic application value. Ablation experiments also confirm the effectiveness of the modules in the S2SDRL model for optimization of the packing order. The sensitivity analysis shows the model's some stability when the input data changes.

引用

页数：13

共 50 条

[41] Deep Reinforcement Learning for Solving Multi-objective Vehicle Routing Problem
Zhang, Jian
Hu, Rong
Wang, Yi-Jun
Yang, Yuan-Yuan
Qian, Bin
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT I, 2023, 14086 : 146 - 155
[42] Integrated Architecture for Smart Grid Energy Management: Deep Attention-Enhanced Sequence-to-Sequence Model with Energy-Aware Optimized Reinforcement Learning for Demand Response
K. R. Deepa
N. Thillaiarasu
SN Computer Science, 5 (8)
[43] Deep learning model predictive control of a high-density polyethylene reactor with a physics-guided sequence-to-sequence model with memory
Jiang, Zhen-Feng
Wei, Xi-Zhan
Kang, Jia-Lin
Wong, David Shan-Hill
Yao, Yuan
Chuang, Yao-Chen
Jang, Shi-Shang
Ou, John Di-Yi
COMPUTERS & CHEMICAL ENGINEERING, 2024, 189
[44] Adaptive disassembly sequence planning for VR maintenance training via deep reinforcement learning
Mao, Haoyang
Liu, Zhenyu
Qiu, Chan
INTERNATIONAL JOURNAL OF ADVANCED MANUFACTURING TECHNOLOGY, 2023, 124 (09): : 3039 - 3048
[45] Sequence generation for multi-task scheduling in cloud manufacturing with deep reinforcement learning
Ping, Yaoyao
Liu, Yongkui
Zhang, Lin
Wang, Lihui
Xu, Xun
JOURNAL OF MANUFACTURING SYSTEMS, 2023, 67 : 315 - 337
[46] Accelerating deep reinforcement learning via knowledge-guided policy network
Yu, Yuanqiang
Zhang, Peng
Zhao, Kai
Zheng, Yan
Hao, Jianye
AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2023, 37 (01)
[47] Adaptive disassembly sequence planning for VR maintenance training via deep reinforcement learning
Haoyang Mao
Zhenyu Liu
Chan Qiu
The International Journal of Advanced Manufacturing Technology, 2023, 124 : 3039 - 3048
[48] Accelerating deep reinforcement learning via knowledge-guided policy network
Yuanqiang Yu
Peng Zhang
Kai Zhao
Yan Zheng
Jianye Hao
Autonomous Agents and Multi-Agent Systems, 2023, 37
[49] HSMH: A Hierarchical Sequence Multi-Hop Reasoning Model With Reinforcement Learning
Wang, Dan
Li, Bo
Song, Bin
Chen, Chen
Yu, F. Richard
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (04) : 1638 - 1649
[50] Stepwise Model Selection for Sequence Prediction via Deep Kernel Learning
Zhang, Yao
Jarrett, Daniel
van der Schaar, Mihaela
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 2304 - 2313

← 1 2 3 4 5 →