Markov Decision Processes with Arbitrary Reward Processes

被引:0
|
作者
Yu, Jia Yuan [1 ]
Mannor, Shie [1 ]
Shimkin, Nahum [2 ]
机构
[1] McGill Univ, Montreal, PQ H3A 2T5, Canada
[2] Technion, Haifa, Israel
来源
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider a control problem where the decision maker interacts with a standard Markov decision process with the exception that the reward functions vary arbitrarily over time. We extend the notion of Hannan consistency to this setting, showing that, in hindsight, the agent can perform almost as well as every deterministic policy. We present efficient online algorithms in the spirit of reinforcement learning that ensure that the agent's performance loss, or regret, vanishes over time, provided that the environment is oblivious to the agent's actions. However, counterexamples indicate that the regret does not vanish if the environment is not oblivious.
引用
收藏
页码:268 / +
页数:3
相关论文
共 50 条
  • [1] Markov Decision Processes with Arbitrary Reward Processes
    Yu, Jia Yuan
    Mannor, Shie
    Shimkin, Nahum
    MATHEMATICS OF OPERATIONS RESEARCH, 2009, 34 (03) : 737 - 757
  • [2] Robust Average-Reward Markov Decision Processes
    Wang, Yue
    Velasquez, Alvaro
    Atia, George
    Prater-Bennette, Ashley
    Zou, Shaofeng
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 12, 2023, : 15215 - 15223
  • [3] CONVERGING MARKOV DECISION PROCESSES WITH MULTIPLICATIVE REWARD SYSTEM
    Fujita T.
    Bulletin of the Kyushu Institute of Technology - Pure and Applied Mathematics, 2023, 2023 (70): : 33 - 41
  • [4] Functional Reward Markov Decision Processes: Theory and Applications
    Weng, Paul
    Spanjaard, Olivier
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2017, 26 (03)
  • [5] Average-Reward Decentralized Markov Decision Processes
    Petrik, Marek
    Zilberstein, Shlomo
    20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 1997 - 2002
  • [6] Differentially Private Reward Functions for Markov Decision Processes
    Benvenuti, Alexander
    Hawkins, Calvin
    Falling, Brandon
    Chen, Bo
    Bialy, Brendan
    Dennis, Miriam
    Hale, Matthew
    2024 IEEE CONFERENCE ON CONTROL TECHNOLOGY AND APPLICATIONS, CCTA 2024, 2024, : 631 - 636
  • [7] Partially observable Markov decision processes with reward information
    Cao, XR
    Guo, XP
    2004 43RD IEEE CONFERENCE ON DECISION AND CONTROL (CDC), VOLS 1-5, 2004, : 4393 - 4398
  • [8] Bounding reward measures of Markov models using the Markov decision processes
    Buchholz, Peter
    NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS, 2011, 18 (06) : 919 - 930
  • [9] MARKOV DECISION-PROCESSES - DISCOUNTED EXPECTED REWARD OR AVERAGE EXPECTED REWARD
    WHITE, DJ
    JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 1993, 172 (02) : 375 - 384
  • [10] Perceptive evaluation for the optimal discounted reward in Markov decision processes
    Kurano, M
    Yasuda, M
    Nakagami, J
    Yoshida, Y
    MODELING DECISIONS FOR ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2005, 3558 : 283 - 293