Policy Teaching Through Reward Function Learning

被引:0
|
作者
Zhang, Haoqi [1 ]
Parkes, David C. [1 ]
Chen, Yiling [1 ]
机构
[1] Harvard Univ, Sch Engn & Appl Sci, Cambridge, MA 02138 USA
关键词
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Policy teaching considers a Markov Decision Process setting in which an interested party aims to influence an agent's decisions by providing limited incentives. In this paper, we consider the specific objective of inducing a pre-specilied desired policy. We examine both the case in which the agent's reward function is known and unknown to the interested party, presenting a linear program for the former case and formulating an active, indirect elicitation method for the latter. We provide conditions for logarithmic convergence, and present a polynomial time algorithm that ensures logarithmic convergence with arbitrarily high probability. We also offer practical elicitation heuristics that can be formulated as linear programs, and demonstrate their effectiveness on a policy teaching problem in a simulated ad network setting. We extend our methods to handle partial observations and partial target policies, and provide a game-theoretic interpretation of our methods for handling strategic agents.
引用
收藏
页码:295 / 304
页数:10
相关论文
共 50 条
  • [31] Teaching social inclusion, public policy and governance through active learning and educational games
    Perez-Duran, Ixchel
    Acebillo-Baque, Miriam
    Comellas-Bonsfills, Josep M.
    TEACHING PUBLIC ADMINISTRATION, 2024,
  • [32] Fast Probabilistic Policy Reuse via Reward Function Fitting
    Liu, Jinmei
    Wang, Zhi
    Chen, Chunlin
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [33] Boosting Policy Learning in Reinforcement Learning via Adaptive Intrinsic Reward Regulation
    Zhao, Qian
    Han, Jinhui
    Xu, Mao
    IEEE ACCESS, 2024, 12 : 2224 - 2235
  • [34] Boosting Policy Learning in Reinforcement Learning via Adaptive Intrinsic Reward Regulation
    Zhao, Qian
    Han, Jinhui
    Xu, Mao
    IEEE Access, 2024, 12 : 2224 - 2235
  • [35] LEARNING AND TEACHING THROUGH DISCUSSION
    HILL, WF
    CENTRAL STATES SPEECH JOURNAL, 1962, 13 (03): : 198 - 198
  • [36] Learning Through Teaching Response
    Ashton, Rendell W.
    Burkart, Kristin M.
    Lenz, Peter H.
    Kumar, Sunita
    McCallister, Jennifer W.
    CHEST, 2018, 153 (04) : 1082 - 1083
  • [37] Reward estimation with scheduled knowledge distillation for dialogue policy learning
    Qiu, Junyan
    Zhang, Haidong
    Yang, Yiping
    CONNECTION SCIENCE, 2023, 35 (01)
  • [38] BATCH POLICY LEARNING IN AVERAGE REWARD MARKOV DECISION PROCESSES
    Liao, Peng
    Qi, Zhengling
    Wan, Runzhe
    Klasnja, Predrag
    Murphy, Susan A.
    ANNALS OF STATISTICS, 2022, 50 (06): : 3364 - 3387
  • [39] Reward-Free Policy Space Compression for Reinforcement Learning
    Mutti, Mirco
    Del Col, Stefano
    Restelli, Marcello
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [40] Pessimistic Reward Models for Off-Policy Learning in Recommendation
    Jeunen, Olivier
    Goethals, Bart
    15TH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS 2021), 2021, : 63 - 74