Recursive learning automata approach to Markov decision processes

被引:6
|
作者
Chang, Hyeong Soo [1 ]
Fu, Michael C.
Hu, Jiaqiao
Marcus, Steven I.
机构
[1] Sogang Univ, Dept Comp Sci & Engn, Seoul 121742, South Korea
[2] Univ Maryland, Robert H Smith Sch Business, College Pk, MD 20742 USA
[3] Univ Maryland, Syst Res Inst, College Pk, MD 20742 USA
[4] SUNY Stony Brook, Dept Appl Math & Stat, Stony Brook, NY 11794 USA
[5] Univ Maryland, Dept Elect & Comp Engn, College Pk, MD 20742 USA
关键词
learning automata; Markov decision process (MDP); sampling;
D O I
10.1109/TAC.2007.900859
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this note, we present a sampling algorithm, called recursive automata sampling algorithm (RASA), for control of finite-horizon Markov decision processes (MDPs). By extending in a recursive manner Sastry's learning automata pursuit algorithm designed for solving nonsequential stochastic optimization problems, RASA returns an estimate of both the optimal action from a given state and the corresponding optimal value. Based on the finite-time analysis of the pursuit algorithm by Rajaraman and Sastry, we provide an analysis for the finite-time behavior of RASA. Specifically, for a given initial state, we derive the following probability bounds as a function of the number of samples: 1) a lower bound on the probability that RASA will sample the optimal action and 2) an upper bound on the probability that the deviation between the true optimal value and the RASA estimate exceeds a given error.
引用
收藏
页码:1349 / 1355
页数:7
相关论文
共 50 条
  • [31] Online Learning of Safety function for Markov Decision Processes
    Mazumdar, Abhijit
    Wisniewski, Rafal
    Bujorianu, Manuela L.
    2023 EUROPEAN CONTROL CONFERENCE, ECC, 2023,
  • [32] Learning Policies for Markov Decision Processes in Continuous Spaces
    Paternain, Santiago
    Bazerque, Juan Andres
    Small, Austin
    Ribeiro, Alejandro
    2018 IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2018, : 4751 - 4758
  • [33] Active Learning of Markov Decision Processes for System Verification
    Chen, Yingke
    Nielsen, Thomas Dyhre
    2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 2, 2012, : 289 - 294
  • [34] Active learning in partially observable Markov decision processes
    Jaulmes, R
    Pineau, J
    Precup, D
    MACHINE LEARNING: ECML 2005, PROCEEDINGS, 2005, 3720 : 601 - 608
  • [35] Learning Policies for Markov Decision Processes From Data
    Hanawal, Manjesh Kumar
    Liu, Hao
    Zhu, Henghui
    Paschalidis, Ioannis Ch.
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2019, 64 (06) : 2298 - 2309
  • [36] Concurrent Markov decision processes for robot team learning
    Girard, Justin
    Emami, M. Reza
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2015, 39 : 223 - 234
  • [37] Learning Adversarial Markov Decision Processes with Delayed Feedback
    Lancewicki, Tal
    Rosenberg, Aviv
    Mansour, Yishay
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 7281 - 7289
  • [38] A reinforcement learning based algorithm for Markov decision processes
    Bhatnagar, S
    Kumar, S
    2005 International Conference on Intelligent Sensing and Information Processing, Proceedings, 2005, : 199 - 204
  • [39] Verification of Markov Decision Processes Using Learning Algorithms
    Brazdil, Tomas
    Chatterjee, Krishnendu
    Chmelik, Martin
    Forejt, Vojtech
    Kretinsky, Jan
    Kwiatkowska, Marta
    Parker, David
    Ujma, Mateusz
    AUTOMATED TECHNOLOGY FOR VERIFICATION AND ANALYSIS, ATVA 2014, 2014, 8837 : 98 - 114
  • [40] Learning and Planning with Timing Information in Markov Decision Processes
    Bacon, Pierre-Luc
    Balle, Borja
    Precup, Doina
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2015, : 111 - 120