Improvement of Reinforcement Learning With Supermodularity

被引:0
|
作者
Meng, Ying [1 ,2 ]
Shi, Fengyuan [3 ,4 ]
Tang, Lixin [5 ]
Sun, Defeng [6 ,7 ]
机构
[1] Northeastern Univ, Natl Frontiers Sci Ctr Ind Intelligence & Syst Opt, Minist Educ, Shenyang, Peoples R China
[2] Northeastern Univ, Key Lab Data Analyt & Optimizat Smart Ind, Shenyang, Peoples R China
[3] Northeastern Univ, Key Lab Data Analyt & Optimizat Smart Ind, Minist Educ, Shenyang 110819, Peoples R China
[4] Northeastern Univ, Liaoning Engn Lab Data Analyt & Optimizat Smart In, Shenyang 110819, Peoples R China
[5] Northeastern Univ, Natl Frontiers Sci Ctr Ind Intelligence & Syst Opt, Shenyang 110819, Peoples R China
[6] Northeastern Univ, Key Lab Data Analyt & Optimizat Smart Ind, Minist Educ, Shenyang 110819, Peoples R China
[7] Northeastern Univ, Liaoning Key Lab Mfg Syst & Logist Optimizat, Shenyang 110819, Peoples R China
基金
中国国家自然科学基金;
关键词
Optimization; Dynamic programming; Industries; Heuristic algorithms; Data analysis; Approximation algorithms; Sufficient conditions; Dynamic parameter; monotone comparative statics; optimization; reinforcement learning (RL); supermodularity; BIN PACKING; ALGORITHM;
D O I
10.1109/TNNLS.2023.3244024
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement learning (RL) is a promising approach to tackling learning and decision-making problems in a dynamic environment. Most studies on RL focus on the improvement of state evaluation or action evaluation. In this article, we investigate how to reduce action space by using supermodularity. We consider the decision tasks in the multistage decision process as a collection of parameterized optimization problems, where state parameters dynamically vary along with the time or stage. The optimal solutions of these parameterized optimization problems correspond to the optimal actions in RL. For a given Markov decision process (MDP) with supermodularity, the monotonicity of the optimal action set and the optimal selection with respect to state parameters can be obtained by using the monotone comparative statics. Accordingly, we propose a monotonicity cut to remove unpromising actions from the action space. Taking bin packing problem (BPP) as an example, we show how the supermodularity and monotonicity cut work in RL. Finally, we evaluate the monotonicity cut on the benchmark datasets reported in the literature and compare the proposed RL with some popular baseline algorithms. The results show that the monotonicity cut can effectively improve the performance of RL.
引用
收藏
页码:5298 / 5309
页数:12
相关论文
共 50 条
  • [1] Constrained Policy Improvement for Efficient Reinforcement Learning
    Sarafian, Elad
    Tamar, Aviv
    Kraus, Sarit
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 2863 - 2871
  • [2] Reinforcement learning for performance improvement in cooperative spectrum sensing
    Kumar, Rahul
    Parmar, Ashok
    Captain, Kamal
    Patel, Jignesh
    PHYSICAL COMMUNICATION, 2023, 59
  • [3] RLSF: Multimodal Sleep Improvement Based Reinforcement Learning
    Che, Nan
    Zhang, Tao
    Li, Yuandi
    Yu, Fei
    Wang, Haitao
    IEEE ACCESS, 2023, 11 : 47712 - 47724
  • [4] A Reinforcement Learning Approach to Feature Model Maintainability Improvement
    Ferchichi, Olfa
    Beltaifa, Raoudha
    Jilani, Lamia
    ENASE: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON EVALUATION OF NOVEL APPROACHES TO SOFTWARE ENGINEERING, 2021, : 389 - 396
  • [5] The improvement on reinforcement learning for SCM by the agent policy mapping
    Sun, Ruoying
    Zhao, Gang
    Li, Chen
    Tatsumi, Shoji
    IECON 2006 - 32ND ANNUAL CONFERENCE ON IEEE INDUSTRIAL ELECTRONICS, VOLS 1-11, 2006, : 2390 - +
  • [6] Improvement of Refrigeration Efficiency by Combining Reinforcement Learning with a Coarse Model
    Zhang, Dapeng
    Gao, Zhiwei
    PROCESSES, 2019, 7 (12)
  • [7] Discussions on Performance Improvement of Reinforcement Learning via Karting Microgame
    Chang, Che-Cheng
    Wu, Po-Ting
    Ooi, Yee-Ming
    2024 11TH INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS-TAIWAN, ICCE-TAIWAN 2024, 2024, : 131 - 132
  • [8] Trustworthy safety improvement for autonomous driving using reinforcement learning
    Cao, Zhong
    Xu, Shaobing
    Jiao, Xinyu
    Peng, Huei
    Yang, Diange
    TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2022, 138
  • [9] Improvement of systems management policies using hybrid reinforcement learning
    Tesauro, Gerald
    Jong, Nicholas K.
    Das, Rajarshi
    Bennani, Mohamed N.
    MACHINE LEARNING: ECML 2006, PROCEEDINGS, 2006, 4212 : 783 - 791
  • [10] On concavity and supermodularity
    Marinacci, Massimo
    Montrucchio, Luigi
    JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 2008, 344 (02) : 642 - 654