Improvement of Reinforcement Learning With Supermodularity

被引:0
|
作者
Meng, Ying [1 ,2 ]
Shi, Fengyuan [3 ,4 ]
Tang, Lixin [5 ]
Sun, Defeng [6 ,7 ]
机构
[1] Northeastern Univ, Natl Frontiers Sci Ctr Ind Intelligence & Syst Opt, Minist Educ, Shenyang, Peoples R China
[2] Northeastern Univ, Key Lab Data Analyt & Optimizat Smart Ind, Shenyang, Peoples R China
[3] Northeastern Univ, Key Lab Data Analyt & Optimizat Smart Ind, Minist Educ, Shenyang 110819, Peoples R China
[4] Northeastern Univ, Liaoning Engn Lab Data Analyt & Optimizat Smart In, Shenyang 110819, Peoples R China
[5] Northeastern Univ, Natl Frontiers Sci Ctr Ind Intelligence & Syst Opt, Shenyang 110819, Peoples R China
[6] Northeastern Univ, Key Lab Data Analyt & Optimizat Smart Ind, Minist Educ, Shenyang 110819, Peoples R China
[7] Northeastern Univ, Liaoning Key Lab Mfg Syst & Logist Optimizat, Shenyang 110819, Peoples R China
基金
中国国家自然科学基金;
关键词
Optimization; Dynamic programming; Industries; Heuristic algorithms; Data analysis; Approximation algorithms; Sufficient conditions; Dynamic parameter; monotone comparative statics; optimization; reinforcement learning (RL); supermodularity; BIN PACKING; ALGORITHM;
D O I
10.1109/TNNLS.2023.3244024
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement learning (RL) is a promising approach to tackling learning and decision-making problems in a dynamic environment. Most studies on RL focus on the improvement of state evaluation or action evaluation. In this article, we investigate how to reduce action space by using supermodularity. We consider the decision tasks in the multistage decision process as a collection of parameterized optimization problems, where state parameters dynamically vary along with the time or stage. The optimal solutions of these parameterized optimization problems correspond to the optimal actions in RL. For a given Markov decision process (MDP) with supermodularity, the monotonicity of the optimal action set and the optimal selection with respect to state parameters can be obtained by using the monotone comparative statics. Accordingly, we propose a monotonicity cut to remove unpromising actions from the action space. Taking bin packing problem (BPP) as an example, we show how the supermodularity and monotonicity cut work in RL. Finally, we evaluate the monotonicity cut on the benchmark datasets reported in the literature and compare the proposed RL with some popular baseline algorithms. The results show that the monotonicity cut can effectively improve the performance of RL.
引用
收藏
页码:5298 / 5309
页数:12
相关论文
共 50 条
  • [21] An Improvement on Mapless Navigation with Deep Reinforcement Learning: A Reward Shaping Approach
    Alipanah, Arezoo
    Moosavian, S. Ali A.
    2022 10TH RSI INTERNATIONAL CONFERENCE ON ROBOTICS AND MECHATRONICS (ICROM), 2022, : 261 - 266
  • [22] Multi-objective Reinforcement Learning with Path Integral Policy Improvement
    Ariizumi, Ryo
    Sago, Hayato
    Asai, Toru
    Azuma, Shun-ichi
    2023 62ND ANNUAL CONFERENCE OF THE SOCIETY OF INSTRUMENT AND CONTROL ENGINEERS, SICE, 2023, : 1418 - 1423
  • [23] Coordinative energy efficiency improvement of buildings based on deep reinforcement learning
    Xu C.
    Li W.
    Rao Y.
    Qi B.
    Yang B.
    Wang Z.
    Cyber-Physical Systems, 2023, 9 (03) : 260 - 272
  • [24] The Value-Improvement Path: Towards Better Representations for Reinforcement Learning
    Dabney, Will
    Barreto, Andre
    Rowland, Mark
    Dadashi, Robert
    Quan, John
    Bellemare, Marc G.
    Silver, David
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 7160 - 7168
  • [25] Infinite supermodularity and preferences
    Chateauneuf, Alain
    Vergopoulos, Vassili
    Zhang, Jianbo
    ECONOMIC THEORY, 2017, 63 (01) : 99 - 109
  • [26] Air conditioner control learning users' sensations based on reinforcement learning and its scalability improvement
    Shigei, Noritaka
    Yamaguchi, Yohei
    Miyajima, Hiromi
    IAENG International Journal of Computer Science, 2015, 42 (03) : 1 - 8
  • [27] Supermodularity in Various Partition Problems
    F. K. Hwang
    M. M. Liao
    Chiuyuan Chen
    Journal of Global Optimization, 2000, 18 : 275 - 282
  • [28] Supermodularity and complementarity.
    Puppe, C
    JOURNAL OF ECONOMICS-ZEITSCHRIFT FUR NATIONALOKONOMIE, 1999, 70 (02): : 212 - 214
  • [29] Space Infrastructure Supermodularity
    Cheung, Kenneth C.
    Wang, Xiao Yu
    2024 IEEE AEROSPACE CONFERENCE, 2024,
  • [30] Infinite supermodularity and preferences
    Alain Chateauneuf
    Vassili Vergopoulos
    Jianbo Zhang
    Economic Theory, 2017, 63 : 99 - 109