Stochastic optimization of controlled partially observable Markov decision processes

被引:0
|
作者
Bartlett, PL [1 ]
Baxter, J [1 ]
机构
[1] Australian Natl Univ, Res Sch Info Sci & Eng, Canberra, ACT 0200, Australia
来源
PROCEEDINGS OF THE 39TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-5 | 2000年
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We introduce an on-line algorithm for finding local maxima of the average reward in a Partially Observable Markov Decision Process (POMDP) controlled by a parameterized policy. Optimization is over the parameters of the policy. The algorithm's chief advantages are that it requires only a single sample path of the POMDP, it uses only one free parameter beta is an element of [0, 1), which has a natural interpretation in terms of a bias-variance trade-off, and it requires no knowledge of the underlying state. In addition, the algorithm can be applied to infinite state, control and observation spaces. We prove almost-sure convergence of our algorithm, and show how the correct setting of 0 is related to the mixing time of the Markov chain induced by the POMDP.
引用
收藏
页码:124 / 129
页数:6
相关论文
共 50 条
  • [1] On the Relationship Between Stochastic Satisfiability and Partially Observable Markov Decision Processes
    Salmon, Ricardo
    Poupart, Pascal
    35TH UNCERTAINTY IN ARTIFICIAL INTELLIGENCE CONFERENCE (UAI 2019), 2020, 115 : 1105 - 1115
  • [2] Partially Observable Markov Decision Processes and Robotics
    Kurniawati, Hanna
    ANNUAL REVIEW OF CONTROL ROBOTICS AND AUTONOMOUS SYSTEMS, 2022, 5 : 253 - 277
  • [3] A tutorial on partially observable Markov decision processes
    Littman, Michael L.
    JOURNAL OF MATHEMATICAL PSYCHOLOGY, 2009, 53 (03) : 119 - 125
  • [4] Quantum partially observable Markov decision processes
    Barry, Jennifer
    Barry, Daniel T.
    Aaronson, Scott
    PHYSICAL REVIEW A, 2014, 90 (03):
  • [5] PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES WITH PARTIALLY OBSERVABLE RANDOM DISCOUNT FACTORS
    Martinez-Garcia, E. Everardo
    Minjarez-Sosa, J. Adolfo
    Vega-Amaya, Oscar
    KYBERNETIKA, 2022, 58 (06) : 960 - 983
  • [6] Experimental results on learning stochastic memoryless policies for Partially Observable Markov Decision Processes
    Williams, JK
    Singh, S
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 11, 1999, 11 : 1073 - 1079
  • [7] Active learning in partially observable Markov decision processes
    Jaulmes, R
    Pineau, J
    Precup, D
    MACHINE LEARNING: ECML 2005, PROCEEDINGS, 2005, 3720 : 601 - 608
  • [8] Structural Estimation of Partially Observable Markov Decision Processes
    Chang, Yanling
    Garcia, Alfredo
    Wang, Zhide
    Sun, Lu
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2023, 68 (08) : 5135 - 5141
  • [9] Entropy Maximization for Partially Observable Markov Decision Processes
    Savas, Yagiz
    Hibbard, Michael
    Wu, Bo
    Tanaka, Takashi
    Topcu, Ufuk
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2022, 67 (12) : 6948 - 6955
  • [10] Nonapproximability results for partially observable Markov decision processes
    Lusena, C
    Goldsmith, J
    Mundhenk, M
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2001, 14 : 83 - 113