Reinforcement Learning for Constrained Markov Decision Processes

被引:0
|
作者
Gattami, Ather [1 ]
Bai, Qinbo [2 ]
Aggarwal, Vaneet [2 ]
机构
[1] AI Sweden, Stockholm, Sweden
[2] Purdue Univ, W Lafayette, IN 47907 USA
关键词
ALGORITHM;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we consider the problem of optimization and learning for constrained and multi-objective Markov decision processes, for both discounted rewards and expected average rewards. We formulate the problems as zero-sum games where one player (the agent) solves a Markov decision problem and its opponent solves a bandit optimization problem, which we here call Markov-Bandit games. We extend Q-learning to solve Markov-Bandit games and show that our new Q-learning algorithms converge to the optimal solutions of the zero-sum Markov-Bandit games, and hence converge to the optimal solutions of the constrained and multi-objective Markov decision problems. We provide numerical examples where we calculate the optimal policies and show by simulations that the algorithm converges to the calculated optimal policies. To the best of our knowledge, this is the first time Q-learning algorithms guarantee convergence to optimal stationary policies for the multi-objective Reinforcement Learning problem with discounted and expected average rewards, respectively.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Risk-Constrained Markov Decision Processes
    Borkar, Vivek
    Jain, Rahul
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2014, 59 (09) : 2574 - 2579
  • [42] Constrained Markov Decision Processes for Intelligent Traffic
    Singh, Tripty
    2019 10TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2019,
  • [43] Entropy Maximization for Constrained Markov Decision Processes
    Savas, Yagiz
    Ornik, Melkior
    Cubuktepe, Murat
    Topcu, Ufuk
    2018 56TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2018, : 911 - 918
  • [44] Dominance-constrained Markov decision processes
    Haskell, William B.
    Jain, Rahul
    2012 IEEE 51ST ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2012, : 5991 - 5996
  • [45] Constrained Markov decision processes with uncertain costs
    Varagapriya, V.
    Singh, Vikas Vikram
    Lisser, Abdel
    OPERATIONS RESEARCH LETTERS, 2022, 50 (02) : 218 - 223
  • [46] Safe Q-Learning Method Based on Constrained Markov Decision Processes
    Ge, Yangyang
    Zhu, Fei
    Lin, Xinghong
    Liu, Quan
    IEEE ACCESS, 2019, 7 : 165007 - 165017
  • [47] On the convergence of projective-simulation-based reinforcement learning in Markov decision processes
    Boyajian, W. L.
    Clausen, J.
    Trenkwalder, L. M.
    Dunjko, V
    Briegel, H. J.
    QUANTUM MACHINE INTELLIGENCE, 2020, 2 (02)
  • [48] Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes
    Zhou, Dongruo
    Gu, Quanquan
    Szepesvari, Csaba
    CONFERENCE ON LEARNING THEORY, VOL 134, 2021, 134
  • [49] Fuzzy Reinforcement Learning Control for Decentralized Partially Observable Markov Decision Processes
    Sharma, Rajneesh
    Spaan, Matthijs T. J.
    IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ 2011), 2011, : 1422 - 1429
  • [50] RVI Reinforcement Learning for Semi-Markov Decision Processes with Average Reward
    Li, Yanjie
    Cao, Fang
    2010 8TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2010, : 1674 - 1679