Reinforcement Learning Based Algorithms for Average Cost Markov Decision Processes

被引：0

作者：

Mohammed Shahid Abdulla

Shalabh Bhatnagar

机构：

[1] Indian Institute of Science,Department of Computer Science and Automation

来源：

Discrete Event Dynamic Systems | 2007年 / 17卷

关键词：

Actor-critic algorithms; Two timescale stochastic approximation; Markov decision processes; Policy iteration; Simultaneous perturbation stochastic approximation; Normalized Hadamard matrices; Reinforcement learning; TD-learning;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

This article proposes several two-timescale simulation-based actor-critic algorithms for solution of infinite horizon Markov Decision Processes with finite state-space under the average cost criterion. Two of the algorithms are for the compact (non-discrete) action setting while the rest are for finite-action spaces. On the slower timescale, all the algorithms perform a gradient search over corresponding policy spaces using two different Simultaneous Perturbation Stochastic Approximation (SPSA) gradient estimates. On the faster timescale, the differential cost function corresponding to a given stationary policy is updated and an additional averaging is performed for enhanced performance. A proof of convergence to a locally optimal policy is presented. Next, we discuss a memory efficient implementation that uses a feature-based representation of the state-space and performs TD(0) learning along the faster timescale. The TD(0) algorithm does not follow an on-line sampling of states but is observed to do well on our setting. Numerical experiments on a problem of rate based flow control are presented using the proposed algorithms. We consider here the model of a single bottleneck node in the continuous time queueing framework. We show performance comparisons of our algorithms with the two-timescale actor-critic algorithms of Konda and Borkar (1999) and Bhatnagar and Kumar (2004). Our algorithms exhibit more than an order of magnitude better performance over those of Konda and Borkar (1999).

引用

页码：23 / 52

页数：29

共 50 条

[41] Constrained optimization for average cost continuous-time markov decision processes
Guo, Xianping
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2007, 52 (06) : 1139 - 1143
[42] Online Learning in Markov Decision Processes with Changing Cost Sequences
Dick, Travis
Gyorgy, Andras
Szepesvari, Csaba
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 1), 2014, 32
[43] Optimising darts strategy using Markov decision processes and reinforcement learning
Baird, Graham
JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY, 2020, 71 (06) : 1020 - 1037
[44] Hierarchical Method for Cooperative Multiagent Reinforcement Learning in Markov Decision Processes
Bolshakov, V. E.
Alfimtsev, A. N.
DOKLADY MATHEMATICS, 2023, 108 (SUPPL 2) : S382 - S392
[45] Hierarchical Method for Cooperative Multiagent Reinforcement Learning in Markov Decision Processes
V. E. Bolshakov
A. N. Alfimtsev
Doklady Mathematics, 2023, 108 : S382 - S392
[46] Model-Free Reinforcement Learning for Branching Markov Decision Processes
Hahn, Ernst Moritz
Perez, Mateo
Schewe, Sven
Somenzi, Fabio
Trivedi, Ashutosh
Wojtczak, Dominik
COMPUTER AIDED VERIFICATION, PT II, CAV 2021, 2021, 12760 : 651 - 673
[47] Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes
Tian, Yi
Qian, Jian
Sra, Suvrit
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[48] Online Reinforcement Learning of Optimal Threshold Policies for Markov Decision Processes
Roy, Arghyadip
Borkar, Vivek
Karandikar, Abhay
Chaporkar, Prasanna
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2022, 67 (07) : 3722 - 3729
[49] Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes
Brazdil, Tomas
Chatterjee, Krishnendu
Novotny, Petr
Vahala, Jiri
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 9794 - 9801
[50] An Inverse Reinforcement Learning Algorithm for semi-Markov Decision Processes
Tan, Chuanfang
Li, Yanjie
Cheng, Yuhu
2017 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2017, : 1256 - 1261

← 1 2 3 4 5 →