Stochastic optimization of controlled partially observable Markov decision processes

被引：0

作者：

Bartlett, PL ^{[1
]}

Baxter, J ^{[1
]}

机构：

[1] Australian Natl Univ, Res Sch Info Sci & Eng, Canberra, ACT 0200, Australia

来源：

PROCEEDINGS OF THE 39TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-5 | 2000年

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We introduce an on-line algorithm for finding local maxima of the average reward in a Partially Observable Markov Decision Process (POMDP) controlled by a parameterized policy. Optimization is over the parameters of the policy. The algorithm's chief advantages are that it requires only a single sample path of the POMDP, it uses only one free parameter beta is an element of [0, 1), which has a natural interpretation in terms of a bias-variance trade-off, and it requires no knowledge of the underlying state. In addition, the algorithm can be applied to infinite state, control and observation spaces. We prove almost-sure convergence of our algorithm, and show how the correct setting of 0 is related to the mixing time of the Markov chain induced by the POMDP.

引用

页码：124 / 129

页数：6

共 50 条

[1] On the Relationship Between Stochastic Satisfiability and Partially Observable Markov Decision Processes
Salmon, Ricardo
Poupart, Pascal
35TH UNCERTAINTY IN ARTIFICIAL INTELLIGENCE CONFERENCE (UAI 2019), 2020, 115 : 1105 - 1115
[2] Partially Observable Markov Decision Processes and Robotics
Kurniawati, Hanna
ANNUAL REVIEW OF CONTROL ROBOTICS AND AUTONOMOUS SYSTEMS, 2022, 5 : 253 - 277
[3] A tutorial on partially observable Markov decision processes
Littman, Michael L.
JOURNAL OF MATHEMATICAL PSYCHOLOGY, 2009, 53 (03) : 119 - 125
[4] Quantum partially observable Markov decision processes
Barry, Jennifer
Barry, Daniel T.
Aaronson, Scott
PHYSICAL REVIEW A, 2014, 90 (03):
[5] PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES WITH PARTIALLY OBSERVABLE RANDOM DISCOUNT FACTORS
Martinez-Garcia, E. Everardo
Minjarez-Sosa, J. Adolfo
Vega-Amaya, Oscar
KYBERNETIKA, 2022, 58 (06) : 960 - 983
[6] Experimental results on learning stochastic memoryless policies for Partially Observable Markov Decision Processes
Williams, JK
Singh, S
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 11, 1999, 11 : 1073 - 1079
[7] Active learning in partially observable Markov decision processes
Jaulmes, R
Pineau, J
Precup, D
MACHINE LEARNING: ECML 2005, PROCEEDINGS, 2005, 3720 : 601 - 608
[8] Structural Estimation of Partially Observable Markov Decision Processes
Chang, Yanling
Garcia, Alfredo
Wang, Zhide
Sun, Lu
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2023, 68 (08) : 5135 - 5141
[9] Entropy Maximization for Partially Observable Markov Decision Processes
Savas, Yagiz
Hibbard, Michael
Wu, Bo
Tanaka, Takashi
Topcu, Ufuk
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2022, 67 (12) : 6948 - 6955
[10] Nonapproximability results for partially observable Markov decision processes
Lusena, C
Goldsmith, J
Mundhenk, M
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2001, 14 : 83 - 113

← 1 2 3 4 5 →