Policy Teaching Through Reward Function Learning

被引:0
|
作者
Zhang, Haoqi [1 ]
Parkes, David C. [1 ]
Chen, Yiling [1 ]
机构
[1] Harvard Univ, Sch Engn & Appl Sci, Cambridge, MA 02138 USA
关键词
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Policy teaching considers a Markov Decision Process setting in which an interested party aims to influence an agent's decisions by providing limited incentives. In this paper, we consider the specific objective of inducing a pre-specilied desired policy. We examine both the case in which the agent's reward function is known and unknown to the interested party, presenting a linear program for the former case and formulating an active, indirect elicitation method for the latter. We provide conditions for logarithmic convergence, and present a polynomial time algorithm that ensures logarithmic convergence with arbitrarily high probability. We also offer practical elicitation heuristics that can be formulated as linear programs, and demonstrate their effectiveness on a policy teaching problem in a simulated ad network setting. We extend our methods to handle partial observations and partial target policies, and provide a game-theoretic interpretation of our methods for handling strategic agents.
引用
收藏
页码:295 / 304
页数:10
相关论文
共 50 条
  • [1] Admissible Policy Teaching through Reward Design
    Banihashem, Kiarash
    Singla, Adish
    Gan, Jiarui
    Radanovic, Goran
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 6037 - 6045
  • [2] Reinforcement Learning With Constrained Uncertain Reward Function Through Particle Filtering
    Dogru, Oguzhan
    Chiplunkar, Ranjith
    Huang, Biao
    IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2022, 69 (07) : 7491 - 7499
  • [3] Teaching Public Policy Advocacy Skills Through Experiential Learning
    Hawkins, Janice
    Tremblay, Beth
    NURSE EDUCATOR, 2021, 46 (04) : E54 - E54
  • [4] PEDAGOGIC WORKSHOP IN THE FUNCTION OF ACTIVE TEACHING AND LEARNING THROUGH SUCCESS
    Buljubasic-Kuzmanovic, Vesna
    METODICKI OGLEDI-METHODICAL REVIEW, 2006, 13 (01): : 123 - 136
  • [5] Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning
    Icarte R.T.
    Klassen T.Q.
    Valenzano R.
    McIlraith S.A.
    Journal of Artificial Intelligence Research, 2022, 73 : 173 - 208
  • [6] Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning
    Icarte, Rodrigo Toro
    Klassen, Toryn Q.
    Valenzano, Richard
    Mcllraith, Sheila A.
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2022, 73 : 173 - 208
  • [7] Reward Certification for Policy Smoothed Reinforcement Learning
    Mu, Ronghui
    Marcolino, Leandro Soriano
    Zhang, Yanghao
    Zhang, Tianle
    Huang, Xiaowei
    Ruan, Wenjie
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 19, 2024, : 21429 - 21437
  • [8] Reward Function Learning for Dialogue Management
    El Asri, Layla
    Laroche, Romain
    Pietquin, Olivier
    PROCEEDINGS OF THE SIXTH STARTING AI RESEARCHERS' SYMPOSIUM (STAIRS 2012), 2012, 241 : 95 - +
  • [9] Pitfalls of Learning a Reward Function Online
    Armstrong, Stuart
    Jan Leike
    Orseau, Laurent
    Legg, Shane
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 1592 - 1600
  • [10] LEARNING IN HONEYBEES AS A FUNCTION OF REWARD PROBABILITY
    COUVILLON, PA
    FISCHER, ME
    BITTERMAN, ME
    BULLETIN OF THE PSYCHONOMIC SOCIETY, 1992, 30 (06) : 445 - 445