Recursive learning automata approach to Markov decision processes

被引:6
|
作者
Chang, Hyeong Soo [1 ]
Fu, Michael C.
Hu, Jiaqiao
Marcus, Steven I.
机构
[1] Sogang Univ, Dept Comp Sci & Engn, Seoul 121742, South Korea
[2] Univ Maryland, Robert H Smith Sch Business, College Pk, MD 20742 USA
[3] Univ Maryland, Syst Res Inst, College Pk, MD 20742 USA
[4] SUNY Stony Brook, Dept Appl Math & Stat, Stony Brook, NY 11794 USA
[5] Univ Maryland, Dept Elect & Comp Engn, College Pk, MD 20742 USA
关键词
learning automata; Markov decision process (MDP); sampling;
D O I
10.1109/TAC.2007.900859
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this note, we present a sampling algorithm, called recursive automata sampling algorithm (RASA), for control of finite-horizon Markov decision processes (MDPs). By extending in a recursive manner Sastry's learning automata pursuit algorithm designed for solving nonsequential stochastic optimization problems, RASA returns an estimate of both the optimal action from a given state and the corresponding optimal value. Based on the finite-time analysis of the pursuit algorithm by Rajaraman and Sastry, we provide an analysis for the finite-time behavior of RASA. Specifically, for a given initial state, we derive the following probability bounds as a function of the number of samples: 1) a lower bound on the probability that RASA will sample the optimal action and 2) an upper bound on the probability that the deviation between the true optimal value and the RASA estimate exceeds a given error.
引用
收藏
页码:1349 / 1355
页数:7
相关论文
共 50 条
  • [41] Learning algorithms or Markov decision processes with average cost
    Abounadi, J
    Bertsekas, D
    Borkar, VS
    SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2001, 40 (03) : 681 - 698
  • [42] A sensitivity view of Markov decision processes and reinforcement learning
    Cao, XR
    MODELING, CONTROL AND OPTIMIZATION OF COMPLEX SYSTEMS: IN HONOR OF PROFESSOR YU-CHI HO, 2003, 14 : 261 - 283
  • [43] Online Learning in Markov Decision Processes with Continuous Actions
    Hong, Yi-Te
    Lu, Chi-Jen
    ALGORITHMIC LEARNING THEORY, ALT 2015, 2015, 9355 : 302 - 316
  • [44] Decomposable Markov Decision Processes: A Fluid Optimization Approach
    Bertsimas, Dimitris
    Misic, Velibor V.
    OPERATIONS RESEARCH, 2016, 64 (06) : 1537 - 1555
  • [45] A COMPROMISE PROGRAMMING APPROACH TO MULTIOBJECTIVE MARKOV DECISION PROCESSES
    Ogryczak, Wlodzimierz
    Perny, Patrice
    Weng, Paul
    INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING, 2013, 12 (05) : 1021 - 1053
  • [46] A CONVEX ANALYTIC APPROACH TO MARKOV DECISION-PROCESSES
    BORKAR, VS
    PROBABILITY THEORY AND RELATED FIELDS, 1988, 78 (04) : 583 - 602
  • [47] An ε-Greedy Multiarmed Bandit Approach to Markov Decision Processes
    Muqattash, Isa
    Hu, Jiaqiao
    STATS, 2023, 6 (01): : 99 - 112
  • [48] A Learning Based Approach to Control Synthesis of Markov Decision Processes for Linear Temporal Logic Specifications
    Sadigh, Dorsa
    Kim, Eric S.
    Coogan, Samuel
    Sastry, S. Shankar
    Seshia, Sanjit A.
    2014 IEEE 53RD ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2014, : 1091 - 1096
  • [49] Efficient qualitative analysis of classes of recursive Markov decision processes and simple Stochastic games
    Etessami, K
    Yannakakis, M
    STACS 2006, PROCEEDINGS, 2006, 3884 : 634 - 645
  • [50] RECURSIVE ADAPTIVE-CONTROL OF MARKOV DECISION-PROCESSES WITH THE AVERAGE REWARD CRITERION
    CAVAZOSCADENA, R
    HERNANDEZLERMA, O
    APPLIED MATHEMATICS AND OPTIMIZATION, 1991, 23 (02): : 193 - 207