Recursive learning automata approach to Markov decision processes

被引：6

作者：

Chang, Hyeong Soo ^{[1
]}

Fu, Michael C.

Hu, Jiaqiao

Marcus, Steven I.

机构：

[1] Sogang Univ, Dept Comp Sci & Engn, Seoul 121742, South Korea

[2] Univ Maryland, Robert H Smith Sch Business, College Pk, MD 20742 USA

[3] Univ Maryland, Syst Res Inst, College Pk, MD 20742 USA

[4] SUNY Stony Brook, Dept Appl Math & Stat, Stony Brook, NY 11794 USA

[5] Univ Maryland, Dept Elect & Comp Engn, College Pk, MD 20742 USA

来源：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL | 2007年 / 52卷 / 07期

关键词：

learning automata; Markov decision process (MDP); sampling;

D O I：

10.1109/TAC.2007.900859

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this note, we present a sampling algorithm, called recursive automata sampling algorithm (RASA), for control of finite-horizon Markov decision processes (MDPs). By extending in a recursive manner Sastry's learning automata pursuit algorithm designed for solving nonsequential stochastic optimization problems, RASA returns an estimate of both the optimal action from a given state and the corresponding optimal value. Based on the finite-time analysis of the pursuit algorithm by Rajaraman and Sastry, we provide an analysis for the finite-time behavior of RASA. Specifically, for a given initial state, we derive the following probability bounds as a function of the number of samples: 1) a lower bound on the probability that RASA will sample the optimal action and 2) an upper bound on the probability that the deviation between the true optimal value and the RASA estimate exceeds a given error.

引用

页码：1349 / 1355

页数：7

共 50 条

[31] Online Learning of Safety function for Markov Decision Processes
Mazumdar, Abhijit
Wisniewski, Rafal
Bujorianu, Manuela L.
2023 EUROPEAN CONTROL CONFERENCE, ECC, 2023,
[32] Learning Policies for Markov Decision Processes in Continuous Spaces
Paternain, Santiago
Bazerque, Juan Andres
Small, Austin
Ribeiro, Alejandro
2018 IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2018, : 4751 - 4758
[33] Active Learning of Markov Decision Processes for System Verification
Chen, Yingke
Nielsen, Thomas Dyhre
2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 2, 2012, : 289 - 294
[34] Active learning in partially observable Markov decision processes
Jaulmes, R
Pineau, J
Precup, D
MACHINE LEARNING: ECML 2005, PROCEEDINGS, 2005, 3720 : 601 - 608
[35] Learning Policies for Markov Decision Processes From Data
Hanawal, Manjesh Kumar
Liu, Hao
Zhu, Henghui
Paschalidis, Ioannis Ch.
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2019, 64 (06) : 2298 - 2309
[36] Concurrent Markov decision processes for robot team learning
Girard, Justin
Emami, M. Reza
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2015, 39 : 223 - 234
[37] Learning Adversarial Markov Decision Processes with Delayed Feedback
Lancewicki, Tal
Rosenberg, Aviv
Mansour, Yishay
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 7281 - 7289
[38] A reinforcement learning based algorithm for Markov decision processes
Bhatnagar, S
Kumar, S
2005 International Conference on Intelligent Sensing and Information Processing, Proceedings, 2005, : 199 - 204
[39] Verification of Markov Decision Processes Using Learning Algorithms
Brazdil, Tomas
Chatterjee, Krishnendu
Chmelik, Martin
Forejt, Vojtech
Kretinsky, Jan
Kwiatkowska, Marta
Parker, David
Ujma, Mateusz
AUTOMATED TECHNOLOGY FOR VERIFICATION AND ANALYSIS, ATVA 2014, 2014, 8837 : 98 - 114
[40] Learning and Planning with Timing Information in Markov Decision Processes
Bacon, Pierre-Luc
Balle, Borja
Precup, Doina
UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2015, : 111 - 120

← 1 2 3 4 5 →