On structural properties of optimal average cost functions in Markov decision processes with Borel spaces and universally measurable policies

被引：1

作者：

Yu, Huizhen ^{[1
]}

机构：

[1] Univ Alberta, Dept Comp Sci, Edmonton, AB, Canada

来源：

JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS | 2022年 / 509卷 / 01期

关键词：

Markov decision processes; Universally measurable policies; Average cost; Submartingales; Reachability; Recurrent Markov chains; MINIMUM PAIR; EQUATION; STATE; CONVERGENCE; EXISTENCE; CHAINS;

D O I：

10.1016/j.jmaa.2021.125954

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

We consider Markov decision processes (MDPs) with Borel state and action spaces and universally measurable policies. For several long-run average cost criteria and two classes of MDPs, we prove sufficient conditions for the optimal average cost functions to be constant almost everywhere with respect to certain sigma-finite measures. Besides suitable boundedness conditions on the positive parts of the one-stage costs, the key condition here is that each subset of states with positive measure be reachable with probability one under some policy. Our proofs exploit an inequality for the optimal average cost functions and its connection with submartingales, and, in a special case that involves stationary policies, also use the theory of recurrent Markov chains. (c) 2021 Elsevier Inc. All rights reserved.

引用

页数：23

共 50 条

[41] Optimal Policies for Quantum Markov Decision Processes
Ming-Sheng Ying
Yuan Feng
Sheng-Gang Ying
International Journal of Automation and Computing, 2021, 18 : 410 - 421
[42] On the Asymptotic Optimality of Finite Approximations to Markov Decision Processes with Borel Spaces
Saldi, Naci
Yuksel, Serdar
Linder, Tamas
MATHEMATICS OF OPERATIONS RESEARCH, 2017, 42 (04) : 945 - 978
[43] DETECTING OPTIMAL AND NONOPTIMAL ACTIONS IN AVERAGE-COST MARKOV DECISION-PROCESSES
LASSERRE, JB
JOURNAL OF APPLIED PROBABILITY, 1994, 31 (04) : 979 - 990
[44] Learning algorithms or Markov decision processes with average cost
Abounadi, J
Bertsekas, D
Borkar, VS
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2001, 40 (03) : 681 - 698
[45] Learning Policies for Markov Decision Processes in Continuous Spaces
Paternain, Santiago
Bazerque, Juan Andres
Small, Austin
Ribeiro, Alejandro
2018 IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2018, : 4751 - 4758
[46] AVERAGE COST SEMI-MARKOV DECISION PROCESSES
ROSS, SM
JOURNAL OF APPLIED PROBABILITY, 1970, 7 (03) : 649 - &
[47] Optimal Approximation of Average Reward Markov Decision Processes
Y. F. Sapronov
N. E. Yudin
Computational Mathematics and Mathematical Physics, 2025, 65 (3) : 567 - 581
[48] NONEXISTENCE OF EPSILON-OPTIMAL RANDOMIZED STATIONARY POLICIES IN AVERAGE COST MARKOV DECISION MODELS
ROSS, SM
ANNALS OF MATHEMATICAL STATISTICS, 1971, 42 (05): : 1767 - &
[49] EXISTENCE OF OPTIMAL STATIONARY POLICIES IN AVERAGE REWARD MARKOV DECISION-PROCESSES WITH A RECURRENT STATE
CAVAZOSCADENA, R
APPLIED MATHEMATICS AND OPTIMIZATION, 1992, 26 (02): : 171 - 194
[50] CONDITIONS FOR EXISTENCE OF AVERAGE AND BLACKWELL OPTIMAL STATIONARY POLICIES IN DENUMERABLE MARKOV DECISION-PROCESSES
LASSERRE, JB
JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 1988, 136 (02) : 479 - 489

← 1 2 3 4 5 →