Stochastic optimization of controlled partially observable Markov decision processes

被引:0
|
作者
Bartlett, PL [1 ]
Baxter, J [1 ]
机构
[1] Australian Natl Univ, Res Sch Info Sci & Eng, Canberra, ACT 0200, Australia
来源
PROCEEDINGS OF THE 39TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-5 | 2000年
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We introduce an on-line algorithm for finding local maxima of the average reward in a Partially Observable Markov Decision Process (POMDP) controlled by a parameterized policy. Optimization is over the parameters of the policy. The algorithm's chief advantages are that it requires only a single sample path of the POMDP, it uses only one free parameter beta is an element of [0, 1), which has a natural interpretation in terms of a bias-variance trade-off, and it requires no knowledge of the underlying state. In addition, the algorithm can be applied to infinite state, control and observation spaces. We prove almost-sure convergence of our algorithm, and show how the correct setting of 0 is related to the mixing time of the Markov chain induced by the POMDP.
引用
收藏
页码:124 / 129
页数:6
相关论文
共 50 条
  • [41] A Fast Approximation Method for Partially Observable Markov Decision Processes
    Liu Bingbing
    Kang Yu
    Jiang Xiaofeng
    Qin Jiahu
    JOURNAL OF SYSTEMS SCIENCE & COMPLEXITY, 2018, 31 (06) : 1423 - 1436
  • [42] Quasi-Deterministic Partially Observable Markov Decision Processes
    Besse, Camille
    Chaib-draa, Brahim
    NEURAL INFORMATION PROCESSING, PT 1, PROCEEDINGS, 2009, 5863 : 237 - 246
  • [43] A Special Case of Partially Observable Markov Decision Processes Problem by Event-Based Optimization
    Zhang, Junyu
    PROCEEDINGS 2016 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY (ICIT), 2016, : 1522 - 1526
  • [44] Position Validation Strategies using Partially Observable Markov Decision Processes
    Kochenderfer, Mykel J.
    Shih, Kevin J.
    Chryssanthacopoulos, James P.
    Rose, Charles E.
    Elder, Tomas R.
    2011 IEEE/AIAA 30TH DIGITAL AVIONICS SYSTEMS CONFERENCE (DASC), 2011,
  • [45] Human-in-the-Loop Synthesis for Partially Observable Markov Decision Processes
    Carr, Steven
    Jansen, Nils
    Wimmer, Ralf
    Fu, Jie
    Topcu, Ufuk
    2018 ANNUAL AMERICAN CONTROL CONFERENCE (ACC), 2018, : 762 - 769
  • [46] Approximate Linear Programming for Constrained Partially Observable Markov Decision Processes
    Poupart, Pascal
    Malhotra, Aarti
    Pei, Pei
    Kim, Kee-Eung
    Goh, Bongseok
    Bowling, Michael
    PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 3342 - 3348
  • [47] POSITION VALIDATION STRATEGIES USING PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES
    Kochenderfer, Mykel J.
    Shih, Kevin J.
    Chryssanthacopoulos, James P.
    Rose, Charles E.
    Elder, Tomas R.
    2011 IEEE/AIAA 30TH DIGITAL AVIONICS SYSTEMS CONFERENCE (DASC), 2011,
  • [48] Parallel rollout for online solution of partially observable Markov decision processes
    Chang, HS
    Givan, R
    Chong, EKP
    DISCRETE EVENT DYNAMIC SYSTEMS-THEORY AND APPLICATIONS, 2004, 14 (03): : 309 - 341
  • [49] Modeling speech using Partially Observable Markov Decision Processes (POMDP)
    Jonas, M
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 4016 - 4016
  • [50] Ambiguous partially observable Markov decision processes: Structural results and applications
    Saghafian, Soroush
    JOURNAL OF ECONOMIC THEORY, 2018, 178 : 1 - 35