Improvement of Reinforcement Learning With Supermodularity

被引:0
|
作者
Meng, Ying [1 ,2 ]
Shi, Fengyuan [3 ,4 ]
Tang, Lixin [5 ]
Sun, Defeng [6 ,7 ]
机构
[1] Northeastern Univ, Natl Frontiers Sci Ctr Ind Intelligence & Syst Opt, Minist Educ, Shenyang, Peoples R China
[2] Northeastern Univ, Key Lab Data Analyt & Optimizat Smart Ind, Shenyang, Peoples R China
[3] Northeastern Univ, Key Lab Data Analyt & Optimizat Smart Ind, Minist Educ, Shenyang 110819, Peoples R China
[4] Northeastern Univ, Liaoning Engn Lab Data Analyt & Optimizat Smart In, Shenyang 110819, Peoples R China
[5] Northeastern Univ, Natl Frontiers Sci Ctr Ind Intelligence & Syst Opt, Shenyang 110819, Peoples R China
[6] Northeastern Univ, Key Lab Data Analyt & Optimizat Smart Ind, Minist Educ, Shenyang 110819, Peoples R China
[7] Northeastern Univ, Liaoning Key Lab Mfg Syst & Logist Optimizat, Shenyang 110819, Peoples R China
基金
中国国家自然科学基金;
关键词
Optimization; Dynamic programming; Industries; Heuristic algorithms; Data analysis; Approximation algorithms; Sufficient conditions; Dynamic parameter; monotone comparative statics; optimization; reinforcement learning (RL); supermodularity; BIN PACKING; ALGORITHM;
D O I
10.1109/TNNLS.2023.3244024
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement learning (RL) is a promising approach to tackling learning and decision-making problems in a dynamic environment. Most studies on RL focus on the improvement of state evaluation or action evaluation. In this article, we investigate how to reduce action space by using supermodularity. We consider the decision tasks in the multistage decision process as a collection of parameterized optimization problems, where state parameters dynamically vary along with the time or stage. The optimal solutions of these parameterized optimization problems correspond to the optimal actions in RL. For a given Markov decision process (MDP) with supermodularity, the monotonicity of the optimal action set and the optimal selection with respect to state parameters can be obtained by using the monotone comparative statics. Accordingly, we propose a monotonicity cut to remove unpromising actions from the action space. Taking bin packing problem (BPP) as an example, we show how the supermodularity and monotonicity cut work in RL. Finally, we evaluate the monotonicity cut on the benchmark datasets reported in the literature and compare the proposed RL with some popular baseline algorithms. The results show that the monotonicity cut can effectively improve the performance of RL.
引用
收藏
页码:5298 / 5309
页数:12
相关论文
共 50 条
  • [31] Soil improvement and reinforcement
    Schlosser, F
    GROUND ENGINEERING, 1997, 30 (08): : 28 - 28
  • [32] The Advance of Reinforcement Learning and Deep Reinforcement Learning
    Lyu, Le
    Shen, Yang
    Zhang, Sicheng
    2022 IEEE INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, BIG DATA AND ALGORITHMS (EEBDA), 2022, : 644 - 648
  • [33] A Reinforcement Learning Method for Multiasset Roadway Improvement Scheduling Considering Traffic Impacts
    Zhou, Weiwen
    Miller-Hooks, Elise
    Papakonstantinou, Kostas G.
    Stoffels, Shelley
    McNeil, Sue
    JOURNAL OF INFRASTRUCTURE SYSTEMS, 2022, 28 (04)
  • [34] Improvement of End-to-end Automatic Driving Algorithm Based on Reinforcement Learning
    Tang, Jianlin
    Li, Lingyun
    Ai, Yunfeng
    Zhao, Bin
    Ren, Liangcai
    Tian, Bin
    2019 CHINESE AUTOMATION CONGRESS (CAC2019), 2019, : 5086 - 5091
  • [35] Deep Reinforcement Learning for autonomous pre-failure tool life improvement
    Hussein A. Taha
    Soumaya Yacout
    Yasser Shaban
    The International Journal of Advanced Manufacturing Technology, 2022, 121 : 6169 - 6192
  • [36] Lifetime Improvement in Rechargeable Mobile IoT Networks Using Deep Reinforcement Learning
    Singh, Aditya
    Rustagi, Rahul
    Hegde, Rajesh M.
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2024, 71 (08) : 4005 - 4009
  • [37] Improvement of the LPWAN AMI backhaul’s latency thanks to reinforcement learning algorithms
    Rémi Bonnefoi
    Christophe Moy
    Jacques Palicot
    EURASIP Journal on Wireless Communications and Networking, 2018
  • [38] Deep Reinforcement Learning for autonomous pre-failure tool life improvement
    Taha, Hussein A.
    Yacout, Soumaya
    Shaban, Yasser
    INTERNATIONAL JOURNAL OF ADVANCED MANUFACTURING TECHNOLOGY, 2022, 121 (9-10): : 6169 - 6192
  • [39] Improvement of Linear and Nonlinear Control for PMSM Using Computational Intelligence and Reinforcement Learning
    Nicola, Marcel
    Nicola, Claudiu-Ionel
    MATHEMATICS, 2022, 10 (24)
  • [40] Improvement of the LPWAN AMI backhaul's latency thanks to reinforcement learning algorithms
    Bonnefoi, Remi
    Moy, Christophe
    Palicot, Jacques
    EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING, 2018, : 1 - 18