Finite horizon partially observable semi-Markov decision processes under risk probability criteria

被引：0

作者：

Wen, Xin ^{[1
]}

Guo, Xianping ^{[2
,3
]}

Xia, Li ^{[1
,3
]}

机构：

[1] Sun Yat Sen Univ, Sch Business, Guangzhou, Peoples R China

[2] Sun Yat Sen Univ, Sch Math, Guangzhou, Peoples R China

[3] Sun Yat Sen Univ, Guangdong Prov Key Lab Computat Sci, Guangzhou, Peoples R China

来源：

OPERATIONS RESEARCH LETTERS | 2024年 / 57卷

基金：

中国国家自然科学基金;

关键词：

Partially observable semi-Markov decision; processes; Risk probability criterion; Finite horizon; Optimal Markov policy; INCOMPLETE INFORMATION; SENSITIVE CONTROL;

D O I：

10.1016/j.orl.2024.107187

中图分类号：

C93 [管理学]; O22 [运筹学];

学科分类号：

070105 ; 12 ; 1201 ; 1202 ; 120202 ;

摘要：

This paper deals with a risk probability minimization problem for finite horizon partially observable semi-Markov decision processes, which are the fairly most general models for stochastic dynamic systems. In contrast to the expected discounted and average criteria, the optimality investigated in this paper is to minimize the probability that the accumulated rewards do not reach a prescribed profit level at the finite terminal stage. First, the state space is augmented as the joint conditional distribution of the current unobserved state and the remaining profit goal. We introduce a class of policies depending on observable histories and a class of Markov policies including observable process with the joint conditional distribution. Then under mild assumptions, we prove that the value function is the unique solution to the optimality equation for the probability criterion by using iteration techniques. The existence of (& varepsilon;-)optimal Markov policy for this problem is established. Finally, we use a bandit problem with the probability criterion to demonstrate our main results in which an effective algorithm and the corresponding numerical calculation are given for the semi-Markov model. Moreover, for the case of reduction to the discrete-time Markov model, we derive a concise solution.

引用

页数：8

共 50 条

[41] PROCEDURES FOR SOLUTION OF A FINITE-HORIZON, PARTIALLY OBSERVED, SEMI-MARKOV OPTIMIZATION PROBLEM
WHITE, CC
OPERATIONS RESEARCH, 1976, 24 (02) : 348 - 358
[42] PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES WITH PARTIALLY OBSERVABLE RANDOM DISCOUNT FACTORS
Martinez-Garcia, E. Everardo
Minjarez-Sosa, J. Adolfo
Vega-Amaya, Oscar
KYBERNETIKA, 2022, 58 (06) : 960 - 983
[43] Observable augmented systems for sensitivity analysis of Markov and semi-Markov processes
Cassandras, Christos G., 1600, (34):
[44] SUCCESSIVE-APPROXIMATIONS FOR FINITE-HORIZON, SEMI-MARKOV DECISION-PROCESSES WITH APPLICATION TO ASSET LIQUIDATION
MAMER, JW
OPERATIONS RESEARCH, 1986, 34 (04) : 638 - 644
[45] Risk Probability Minimization Problems for Continuous-Time Markov Decision Processes on Finite Horizon
Huo, Haifeng
Guo, Xianping
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2020, 65 (07) : 3199 - 3206
[46] Maintenance Decision-Making Using a Continuous-State Partially Observable Semi-Markov Decision Process
Zhou, Yifan
Ma, Lin
Mathew, Joseph
Sun, Yong
Wolff, Rodney
2010 PROGNOSTICS AND SYSTEM HEALTH MANAGEMENT CONFERENCE, 2010, : 322 - +
[47] A Unified Approach for Semi-Markov Decision Processes with Discounted and Average Reward Criteria
Li, Yanjie
Wang, Huijing
Chen, Haoyao
2014 11TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2014, : 1741 - 1744
[48] Active learning in partially observable Markov decision processes
Jaulmes, R
Pineau, J
Precup, D
MACHINE LEARNING: ECML 2005, PROCEEDINGS, 2005, 3720 : 601 - 608
[49] Performance optimization of semi-Markov decision processes with discounted-cost criteria
Yin, Baoqun
Li, Yanjie
Zhou, Yaping
Xi, Hongsheng
EUROPEAN JOURNAL OF CONTROL, 2008, 14 (03) : 213 - 222
[50] Structural Estimation of Partially Observable Markov Decision Processes
Chang, Yanling
Garcia, Alfredo
Wang, Zhide
Sun, Lu
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2023, 68 (08) : 5135 - 5141

← 1 2 3 4 5 →