Deriving Explicit Control Policies for Markov Decision Processes Using Symbolic Regression

被引:2
|
作者
Hristov, A. [1 ]
Bosman, J. W. [1 ]
Bhulai, S. [2 ]
van der Mei, R. D. [1 ]
机构
[1] Ctr Math & Comp Sci, Stochast Grp, Amsterdam, Netherlands
[2] Vrije Univ Amsterdam, Dept Math, Amsterdam, Netherlands
来源
PROCEEDINGS OF THE 13TH EAI INTERNATIONAL CONFERENCE ON PERFORMANCE EVALUATION METHODOLOGIES AND TOOLS ( VALUETOOLS 2020) | 2020年
关键词
Markov Decision Processes; Genetic program; Symbolic regression; Threshold-type policy; Optimal control; Closedform approximation;
D O I
10.1145/3388831.3388840
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In this paper, we introduce a novel approach to optimizing the control of systems that can be modeled as Markov decision processes (MDPs) with a threshold-based optimal policy. Our method is based on a specific type of genetic program known as symbolic regression (SR). We present how the performance of this program can be greatly improved by taking into account the corresponding MDP framework in which we apply it. The proposed method has two main advantages: (1) it results in near-optimal decision policies, and (2) in contrast to other algorithms, it generates closed-form approximations. Obtaining an explicit expression for the decision policy gives the opportunity to conduct sensitivity analysis, and allows instant calculation of a new threshold function for any change in the parameters. We emphasize that the introduced technique is highly general and applicable to MDPs that have a threshold-based policy. Extensive experimentation demonstrates the usefulness of the method.
引用
收藏
页码:41 / 47
页数:7
相关论文
共 50 条
  • [41] Non-randomized policies for constrained Markov decision processes
    Chen, Richard C.
    Feinberg, Eugene A.
    MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 2007, 66 (01) : 165 - 179
  • [42] Control Logic Synthesis for Manufacturing Systems Using Markov Decision Processes
    Lee, Changmin
    Park, Jehyun
    Choi, Jongeun
    Ha, Jaebok
    Lee, Sangyeong
    IFAC PAPERSONLINE, 2021, 54 (20): : 495 - 502
  • [43] The complexity of decentralized control of Markov decision processes
    Bernstein, DS
    Givan, R
    Immerman, N
    Zilberstein, S
    MATHEMATICS OF OPERATIONS RESEARCH, 2002, 27 (04) : 819 - 840
  • [44] Symbolic algorithms for qualitative analysis of Markov decision processes with Büchi objectives
    Krishnendu Chatterjee
    Monika Henzinger
    Manas Joglekar
    Nisarg Shah
    Formal Methods in System Design, 2013, 42 : 301 - 327
  • [45] Finding good stochastic factored policies for factored Markov decision processes
    Radoszycki, Julia
    Peyrard, Nathalie
    Sabbadin, Regis
    21ST EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2014), 2014, 263 : 1083 - 1084
  • [46] DETERMINING NEAR-OPTIMAL POLICIES FOR MARKOV RENEWAL DECISION PROCESSES
    BOYSE, JW
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1974, MC 4 (02): : 215 - 217
  • [47] Uniqueness and stability of optimal policies of finite state Markov decision processes
    Leizarowitz, Arie
    Zaslavski, Alexander J.
    MATHEMATICS OF OPERATIONS RESEARCH, 2007, 32 (01) : 156 - 167
  • [48] COMPARING POLICIES IN MARKOV DECISION-PROCESSES - MANDL LEMMA REVISITED
    SHWARTZ, A
    MAKOWSKI, AM
    MATHEMATICS OF OPERATIONS RESEARCH, 1990, 15 (01) : 155 - 174
  • [49] Markov Decision Processes with Threshold Based Piecewise Linear Optimal Policies
    Erseghe, Tomaso
    Zanella, Andrea
    Codemo, Claudio G.
    IEEE WIRELESS COMMUNICATIONS LETTERS, 2013, 2 (04) : 459 - 462
  • [50] Online Reinforcement Learning of Optimal Threshold Policies for Markov Decision Processes
    Roy, Arghyadip
    Borkar, Vivek
    Karandikar, Abhay
    Chaporkar, Prasanna
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2022, 67 (07) : 3722 - 3729