Reinforcement Learning Based Algorithms for Average Cost Markov Decision Processes

被引:0
|
作者
Mohammed Shahid Abdulla
Shalabh Bhatnagar
机构
[1] Indian Institute of Science,Department of Computer Science and Automation
来源
关键词
Actor-critic algorithms; Two timescale stochastic approximation; Markov decision processes; Policy iteration; Simultaneous perturbation stochastic approximation; Normalized Hadamard matrices; Reinforcement learning; TD-learning;
D O I
暂无
中图分类号
学科分类号
摘要
This article proposes several two-timescale simulation-based actor-critic algorithms for solution of infinite horizon Markov Decision Processes with finite state-space under the average cost criterion. Two of the algorithms are for the compact (non-discrete) action setting while the rest are for finite-action spaces. On the slower timescale, all the algorithms perform a gradient search over corresponding policy spaces using two different Simultaneous Perturbation Stochastic Approximation (SPSA) gradient estimates. On the faster timescale, the differential cost function corresponding to a given stationary policy is updated and an additional averaging is performed for enhanced performance. A proof of convergence to a locally optimal policy is presented. Next, we discuss a memory efficient implementation that uses a feature-based representation of the state-space and performs TD(0) learning along the faster timescale. The TD(0) algorithm does not follow an on-line sampling of states but is observed to do well on our setting. Numerical experiments on a problem of rate based flow control are presented using the proposed algorithms. We consider here the model of a single bottleneck node in the continuous time queueing framework. We show performance comparisons of our algorithms with the two-timescale actor-critic algorithms of Konda and Borkar (1999) and Bhatnagar and Kumar (2004). Our algorithms exhibit more than an order of magnitude better performance over those of Konda and Borkar (1999).
引用
收藏
页码:23 / 52
页数:29
相关论文
共 50 条
  • [21] A sensitivity view of Markov decision processes and reinforcement learning
    Cao, XR
    MODELING, CONTROL AND OPTIMIZATION OF COMPLEX SYSTEMS: IN HONOR OF PROFESSOR YU-CHI HO, 2003, 14 : 261 - 283
  • [22] On the convergence of projective-simulation-based reinforcement learning in Markov decision processes
    Boyajian, W. L.
    Clausen, J.
    Trenkwalder, L. M.
    Dunjko, V
    Briegel, H. J.
    QUANTUM MACHINE INTELLIGENCE, 2020, 2 (02)
  • [23] On the convergence of projective-simulation–based reinforcement learning in Markov decision processes
    W. L. Boyajian
    J. Clausen
    L. M. Trenkwalder
    V. Dunjko
    H. J. Briegel
    Quantum Machine Intelligence, 2020, 2
  • [24] Verification of Markov Decision Processes Using Learning Algorithms
    Brazdil, Tomas
    Chatterjee, Krishnendu
    Chmelik, Martin
    Forejt, Vojtech
    Kretinsky, Jan
    Kwiatkowska, Marta
    Parker, David
    Ujma, Mateusz
    AUTOMATED TECHNOLOGY FOR VERIFICATION AND ANALYSIS, ATVA 2014, 2014, 8837 : 98 - 114
  • [25] Combining Learning Algorithms: An Approach to Markov Decision Processes
    Ribeiro, Richardson
    Favarim, Fabio
    Barbosa, Marco A. C.
    Koerich, Alessandro L.
    Enembreck, Fabricio
    ENTERPRISE INFORMATION SYSTEMS, ICEIS 2012, 2013, 141 : 172 - 188
  • [26] Learning and Planning in Average-Reward Markov Decision Processes
    Wan, Yi
    Naik, Abhishek
    Sutton, Richard S.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139 : 7665 - 7676
  • [27] BATCH POLICY LEARNING IN AVERAGE REWARD MARKOV DECISION PROCESSES
    Liao, Peng
    Qi, Zhengling
    Wan, Runzhe
    Klasnja, Predrag
    Murphy, Susan A.
    ANNALS OF STATISTICS, 2022, 50 (06): : 3364 - 3387
  • [28] ON PARTIALLY OBSERVABLE MARKOV DECISION-PROCESSES WITH AN AVERAGE COST CRITERION
    FERNANDEZGAUCHERAND, E
    ARAPOSTATHIS, A
    MARCUS, SI
    PROCEEDINGS OF THE 28TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-3, 1989, : 1267 - 1273
  • [29] Value Iteration for Average Cost Markov Decision Processes in Borel Spaces
    Zhu, Quanxin
    Guo, Xianping
    APPLIED MATHEMATICS RESEARCH EXPRESS, 2005, (02) : 61 - 76
  • [30] Average Cost Markov Decision Processes with Weakly Continuous Transition Probabilities
    Feinberg, Eugene A.
    Kasyanov, Pavlo O.
    Zadoianchuk, Nina V.
    MATHEMATICS OF OPERATIONS RESEARCH, 2012, 37 (04) : 591 - 607