Online Learning in Markov Decision Processes with Arbitrarily Changing Rewards and Transitions

被引:19
|
作者
Yu, Jia Yuan [1 ]
Mannor, Shie [1 ]
机构
[1] McGill Univ, Dept Elect & Comp Engn, Montreal, PQ H3A 2T5, Canada
关键词
D O I
10.1109/GAMENETS.2009.5137416
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We consider decision-making problems in Markov decision processes where both the rewards and the transition probabilities vary in an arbitrary (e.g., non-stationary) fashion. We present algorithms that combine online learning and robust control, and establish guarantees on their performance evaluated in retrospect against alternative policies-i.e., their regret. These guarantees depend critically on the range of uncertainty in the transition probabilities, but hold regardless of the changes in rewards and transition probabilities over time. We present a version of the main algorithm in the setting where the decision-maker's observations are limited to its trajectory, and another version that allows a trade-off between performance and computational complexity.
引用
收藏
页码:314 / 322
页数:9
相关论文
共 50 条
  • [1] Online Learning in Markov Decision Processes with Changing Cost Sequences
    Dick, Travis
    Gyorgy, Andras
    Szepesvari, Csaba
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 1), 2014, 32
  • [2] Large Scale Markov Decision Processes with Changing Rewards
    Cardoso, Adrian Rivera
    Wang, He
    Xu, Huan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [3] Blackwell Online Learning for Markov Decision Processes
    Li, Tao
    Peng, Guanze
    Zhu, Quanyan
    2021 55TH ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2021,
  • [4] Online Learning in Kernelized Markov Decision Processes
    Chowdhury, Sayak Ray
    Gopalan, Aditya
    22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
  • [5] Arbitrarily Modulated Markov Decision Processes
    Yu, Jia Yuan
    Mannor, Shie
    PROCEEDINGS OF THE 48TH IEEE CONFERENCE ON DECISION AND CONTROL, 2009 HELD JOINTLY WITH THE 2009 28TH CHINESE CONTROL CONFERENCE (CDC/CCC 2009), 2009, : 2946 - 2953
  • [6] Markov Decision Processes with Functional Rewards
    Spanjaard, Olivier
    Weng, Paul
    MULTI-DISCIPLINARY TRENDS IN ARTIFICIAL INTELLIGENCE, 2013, 8271 : 269 - 280
  • [7] Online Regret Bounds for Markov Decision Processes with Deterministic Transitions
    Ortner, Ronald
    ALGORITHMIC LEARNING THEORY, PROCEEDINGS, 2008, 5254 : 123 - 137
  • [8] Online regret bounds for Markov decision processes with deterministic transitions
    Ortner, Ronald
    THEORETICAL COMPUTER SCIENCE, 2010, 411 (29-30) : 2684 - 2695
  • [9] Online Learning of Safety function for Markov Decision Processes
    Mazumdar, Abhijit
    Wisniewski, Rafal
    Bujorianu, Manuela L.
    2023 EUROPEAN CONTROL CONFERENCE, ECC, 2023,
  • [10] Online Learning in Markov Decision Processes with Continuous Actions
    Hong, Yi-Te
    Lu, Chi-Jen
    ALGORITHMIC LEARNING THEORY, ALT 2015, 2015, 9355 : 302 - 316