Recursive learning automata approach to Markov decision processes

被引:6
|
作者
Chang, Hyeong Soo [1 ]
Fu, Michael C.
Hu, Jiaqiao
Marcus, Steven I.
机构
[1] Sogang Univ, Dept Comp Sci & Engn, Seoul 121742, South Korea
[2] Univ Maryland, Robert H Smith Sch Business, College Pk, MD 20742 USA
[3] Univ Maryland, Syst Res Inst, College Pk, MD 20742 USA
[4] SUNY Stony Brook, Dept Appl Math & Stat, Stony Brook, NY 11794 USA
[5] Univ Maryland, Dept Elect & Comp Engn, College Pk, MD 20742 USA
关键词
learning automata; Markov decision process (MDP); sampling;
D O I
10.1109/TAC.2007.900859
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this note, we present a sampling algorithm, called recursive automata sampling algorithm (RASA), for control of finite-horizon Markov decision processes (MDPs). By extending in a recursive manner Sastry's learning automata pursuit algorithm designed for solving nonsequential stochastic optimization problems, RASA returns an estimate of both the optimal action from a given state and the corresponding optimal value. Based on the finite-time analysis of the pursuit algorithm by Rajaraman and Sastry, we provide an analysis for the finite-time behavior of RASA. Specifically, for a given initial state, we derive the following probability bounds as a function of the number of samples: 1) a lower bound on the probability that RASA will sample the optimal action and 2) an upper bound on the probability that the deviation between the true optimal value and the RASA estimate exceeds a given error.
引用
收藏
页码:1349 / 1355
页数:7
相关论文
共 50 条
  • [1] Recursive learning automata for control of partially observable Markov decision processes
    Chang, Hyeong Soo
    Fu, Michael C.
    Marcus, Steven I.
    2005 44TH IEEE CONFERENCE ON DECISION AND CONTROL & EUROPEAN CONTROL CONFERENCE, VOLS 1-8, 2005, : 6091 - 6096
  • [2] Reachability in recursive Markov decision processes
    Brazdil, Tomas
    Brozek, Vaclav
    Forejt, Vojtech
    Kucera, Antonin
    CONCUR 2006 - CONCURRENCY THEORY, PROCEEDINGS, 2006, 4137 : 358 - 374
  • [3] Reachability in recursive Markov decision processes
    Brazdil, Tomas
    Brozek, Vaclav
    Forejt, Vojtech
    Kucera, Antonin
    INFORMATION AND COMPUTATION, 2008, 206 (05) : 520 - 537
  • [4] Recursive Markov Decision Processes and Recursive Stochastic Games
    Etessami, Kousha
    Yannakakis, Mihalis
    JOURNAL OF THE ACM, 2015, 62 (02)
  • [5] Recursive Markov decision processes and recursive stochastic games
    Etessami, K
    Yannakakis, M
    AUTOMATA, LANGUAGES AND PROGRAMMING, PROCEEDINGS, 2005, 3580 : 891 - 903
  • [6] Markov decision processes with recursive risk measures
    Baeuerle, Nicole
    Glauner, Alexander
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2022, 296 (03) : 953 - 966
  • [7] Markov Decision Processes and deterministic Buchi automata
    Beauquier, D
    FUNDAMENTA INFORMATICAE, 2002, 50 (01) : 1 - 13
  • [8] Combining Learning Algorithms: An Approach to Markov Decision Processes
    Ribeiro, Richardson
    Favarim, Fabio
    Barbosa, Marco A. C.
    Koerich, Alessandro L.
    Enembreck, Fabricio
    ENTERPRISE INFORMATION SYSTEMS, ICEIS 2012, 2013, 141 : 172 - 188
  • [9] Solving Multi-Agent Markov Decision Processes Using Learning Automata
    Abtahi, Farnaz
    Meybodi, Mohammad Reza
    2008 6TH INTERNATIONAL SYMPOSIUM ON INTELLIGENT SYSTEMS AND INFORMATICS, 2008, : 54 - 59
  • [10] Learning Unknown Markov Decision Processes: A Thompson Sampling Approach
    Ouyang, Yi
    Gagrani, Mukul
    Nayyar, Ashutosh
    Jain, Rahul
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30