Deriving Explicit Control Policies for Markov Decision Processes Using Symbolic Regression

被引：2

作者：

Hristov, A. ^{[1
]}

Bosman, J. W. ^{[1
]}

Bhulai, S. ^{[2
]}

van der Mei, R. D. ^{[1
]}

机构：

[1] Ctr Math & Comp Sci, Stochast Grp, Amsterdam, Netherlands

[2] Vrije Univ Amsterdam, Dept Math, Amsterdam, Netherlands

来源：

PROCEEDINGS OF THE 13TH EAI INTERNATIONAL CONFERENCE ON PERFORMANCE EVALUATION METHODOLOGIES AND TOOLS ( VALUETOOLS 2020) | 2020年

关键词：

Markov Decision Processes; Genetic program; Symbolic regression; Threshold-type policy; Optimal control; Closedform approximation;

D O I：

10.1145/3388831.3388840

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

In this paper, we introduce a novel approach to optimizing the control of systems that can be modeled as Markov decision processes (MDPs) with a threshold-based optimal policy. Our method is based on a specific type of genetic program known as symbolic regression (SR). We present how the performance of this program can be greatly improved by taking into account the corresponding MDP framework in which we apply it. The proposed method has two main advantages: (1) it results in near-optimal decision policies, and (2) in contrast to other algorithms, it generates closed-form approximations. Obtaining an explicit expression for the decision policy gives the opportunity to conduct sensitivity analysis, and allows instant calculation of a new threshold function for any change in the parameters. We emphasize that the introduced technique is highly general and applicable to MDPs that have a threshold-based policy. Extensive experimentation demonstrates the usefulness of the method.

引用

页码：41 / 47

页数：7

共 50 条

[41] Non-randomized policies for constrained Markov decision processes
Chen, Richard C.
Feinberg, Eugene A.
MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 2007, 66 (01) : 165 - 179
[42] Control Logic Synthesis for Manufacturing Systems Using Markov Decision Processes
Lee, Changmin
Park, Jehyun
Choi, Jongeun
Ha, Jaebok
Lee, Sangyeong
IFAC PAPERSONLINE, 2021, 54 (20): : 495 - 502
[43] The complexity of decentralized control of Markov decision processes
Bernstein, DS
Givan, R
Immerman, N
Zilberstein, S
MATHEMATICS OF OPERATIONS RESEARCH, 2002, 27 (04) : 819 - 840
[44] Symbolic algorithms for qualitative analysis of Markov decision processes with Büchi objectives
Krishnendu Chatterjee
Monika Henzinger
Manas Joglekar
Nisarg Shah
Formal Methods in System Design, 2013, 42 : 301 - 327
[45] Finding good stochastic factored policies for factored Markov decision processes
Radoszycki, Julia
Peyrard, Nathalie
Sabbadin, Regis
21ST EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2014), 2014, 263 : 1083 - 1084
[46] DETERMINING NEAR-OPTIMAL POLICIES FOR MARKOV RENEWAL DECISION PROCESSES
BOYSE, JW
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1974, MC 4 (02): : 215 - 217
[47] Uniqueness and stability of optimal policies of finite state Markov decision processes
Leizarowitz, Arie
Zaslavski, Alexander J.
MATHEMATICS OF OPERATIONS RESEARCH, 2007, 32 (01) : 156 - 167
[48] COMPARING POLICIES IN MARKOV DECISION-PROCESSES - MANDL LEMMA REVISITED
SHWARTZ, A
MAKOWSKI, AM
MATHEMATICS OF OPERATIONS RESEARCH, 1990, 15 (01) : 155 - 174
[49] Markov Decision Processes with Threshold Based Piecewise Linear Optimal Policies
Erseghe, Tomaso
Zanella, Andrea
Codemo, Claudio G.
IEEE WIRELESS COMMUNICATIONS LETTERS, 2013, 2 (04) : 459 - 462
[50] Online Reinforcement Learning of Optimal Threshold Policies for Markov Decision Processes
Roy, Arghyadip
Borkar, Vivek
Karandikar, Abhay
Chaporkar, Prasanna
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2022, 67 (07) : 3722 - 3729

← 1 2 3 4 5 →