Robust Markov Decision Processes

被引:218
|
作者
Wiesemann, Wolfram [1 ]
Kuhn, Daniel [1 ]
Rustem, Berc [1 ]
机构
[1] Univ London Imperial Coll Sci Technol & Med, Dept Comp, London SW7 2AZ, England
基金
英国工程与自然科学研究理事会;
关键词
robust optimization; Markov decision processes; semidefinite programming;
D O I
10.1287/moor.1120.0566
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
Markov decision processes (MDPs) are powerful tools for decision making in uncertain dynamic environments. However, the solutions of MDPs are of limited practical use because of their sensitivity to distributional model parameters, which are typically unknown and have to be estimated by the decision maker. To counter the detrimental effects of estimation errors, we consider robust MDPs that offer probabilistic guarantees in view of the unknown parameters. To this end, we assume that an observation history of the MDP is available. Based on this history, we derive a confidence region that contains the unknown parameters with a prespecified probability 1 - beta. Afterward, we determine a policy that attains the highest worst-case performance over this confidence region. By construction, this policy achieves or exceeds its worst-case performance with a confidence of at least 1 - beta. Our method involves the solution of tractable conic programs of moderate size.
引用
收藏
页码:153 / 183
页数:31
相关论文
共 50 条
  • [21] Robust analysis of discounted Markov decision processes with uncertain transition probabilities
    LOU Zhen-kai
    HOU Fu-jun
    LOU Xu-ming
    Applied Mathematics:A Journal of Chinese Universities, 2020, 35 (04) : 417 - 436
  • [22] Robust motion planning using Markov Decision Processes and quadtree decomposition
    Burlet, J
    Aycard, O
    Fraichard, T
    2004 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS 1- 5, PROCEEDINGS, 2004, : 2820 - 2825
  • [23] Robust Adaptive Markov Decision Processes in Multi-vehicle Applications
    Bertuccelli, Luca F.
    Bethke, Brett
    How, Jonathan P.
    2009 AMERICAN CONTROL CONFERENCE, VOLS 1-9, 2009, : 1304 - 1309
  • [24] Robust Control of Uncertain Markov Decision Processes with Temporal Logic Specifications
    Wolff, Eric M.
    Topcu, Ufuk
    Murray, Richard M.
    2012 IEEE 51ST ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2012, : 3372 - 3379
  • [25] Robust decomposable Markov decision processes motivated by allocating school budgets
    Dimitrov, Nedialko B.
    Dimitrov, Stanko
    Chukova, Stefanka
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2014, 239 (01) : 199 - 213
  • [26] Minimax Regret Optimisation for Robust Planning in Uncertain Markov Decision Processes
    Rigter, Marc
    Lacerda, Bruno
    Hawes, Nick
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 11930 - 11938
  • [27] Robust analysis of discounted Markov decision processes with uncertain transition probabilities
    Lou Zhen-kai
    Hou Fu-jun
    Lou Xu-ming
    APPLIED MATHEMATICS-A JOURNAL OF CHINESE UNIVERSITIES SERIES B, 2020, 35 (04) : 417 - 436
  • [28] Robust and Efficient Transfer Learning with Hidden Parameter Markov Decision Processes
    Killian, Taylor
    Daulton, Samuel
    Konidaris, George
    Doshi-Velez, Finale
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [29] Robust and Efficient Transfer Learning with Hidden Parameter Markov Decision Processes
    Killian, Taylor W.
    Konidaris, George
    Doshi-Velez, Finale
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4949 - 4950
  • [30] Markov decision processes
    White, D.J.
    Journal of the Operational Research Society, 1995, 46 (06):