Policy Teaching Through Reward Function Learning

被引:0
|
作者
Zhang, Haoqi [1 ]
Parkes, David C. [1 ]
Chen, Yiling [1 ]
机构
[1] Harvard Univ, Sch Engn & Appl Sci, Cambridge, MA 02138 USA
关键词
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Policy teaching considers a Markov Decision Process setting in which an interested party aims to influence an agent's decisions by providing limited incentives. In this paper, we consider the specific objective of inducing a pre-specilied desired policy. We examine both the case in which the agent's reward function is known and unknown to the interested party, presenting a linear program for the former case and formulating an active, indirect elicitation method for the latter. We provide conditions for logarithmic convergence, and present a polynomial time algorithm that ensures logarithmic convergence with arbitrarily high probability. We also offer practical elicitation heuristics that can be formulated as linear programs, and demonstrate their effectiveness on a policy teaching problem in a simulated ad network setting. We extend our methods to handle partial observations and partial target policies, and provide a game-theoretic interpretation of our methods for handling strategic agents.
引用
收藏
页码:295 / 304
页数:10
相关论文
共 50 条
  • [21] Dynamic Adjustment of Reward Function for Proximal Policy Optimization with Imitation Learning: Application to Automated Parking Systems
    Albilani, Mohamad
    Bouzeghoub, Amel
    2022 IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV), 2022, : 1400 - 1408
  • [22] Evolution of an Internal Reward Function for Reinforcement Learning
    Zuo, Weiyi
    Pedersen, Joachim Winther
    Risi, Sebastian
    PROCEEDINGS OF THE 2023 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE COMPANION, GECCO 2023 COMPANION, 2023, : 351 - 354
  • [23] CHILDRENS DISCRIMINATION LEARNING AS A FUNCTION OF REWARD AND PUNISHMENT
    PENNEY, RK
    LUPTON, AA
    JOURNAL OF COMPARATIVE AND PHYSIOLOGICAL PSYCHOLOGY, 1961, 54 (04): : 449 - &
  • [24] Active reward learning with a novel acquisition function
    Christian Daniel
    Oliver Kroemer
    Malte Viering
    Jan Metz
    Jan Peters
    Autonomous Robots, 2015, 39 : 389 - 405
  • [25] LEARNING IN HONEYBEES AS A FUNCTION OF AMOUNT AND FREQUENCY OF REWARD
    BUCHANAN, GM
    BITTERMAN, ME
    ANIMAL LEARNING & BEHAVIOR, 1988, 16 (03): : 247 - 255
  • [26] A Humanoid Robot Standing Up Through Learning from Demonstration Using a Multimodal Reward Function
    Gonzalez-Fierro, Miguel
    Balaguer, Carlos
    Swann, Nicola
    Nanayakkara, Thrishantha
    2013 13TH IEEE-RAS INTERNATIONAL CONFERENCE ON HUMANOID ROBOTS (HUMANOIDS), 2013, : 74 - 79
  • [27] Average-Reward Off-Policy Policy Evaluation with Function Approximation
    Zhang, Shangtong
    Wan, Yi
    Sutton, Richard S.
    Whiteson, Shimon
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [28] Teaching family policy through a policy practice framework
    Rocha, CJ
    Johnson, AK
    JOURNAL OF SOCIAL WORK EDUCATION, 1997, 33 (03) : 433 - 444
  • [29] Learning reward timing in cortex through reward dependent expression of synaptic plasticity
    Gavornik, Jeffrey P.
    Shuler, Marshall G. Hussain
    Loewenstein, Yonatan
    Bear, Mark F.
    Shouval, Harel Z.
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2009, 106 (16) : 6826 - 6831
  • [30] Transformation of language in teaching and learning policy
    Ahmad, Rokiah Rozita
    Majid, Noriza
    Mamat, Nur Jumaadzan Zaleha
    Rambely, Azmin Sham
    Muda, Nora
    Jaaman, Saiful Hafizah Hj
    Suradi, Nur Riza Mohd
    Ismail, Wan Rosmanira
    Shahabuddin, Faridatulazna Ahmad
    Nazar, Roslinda Mohd
    Samsudin, Humaida Banu
    Zin, Wan Zawiah Wan
    Zahari, Marina
    Rafee, Najib Mahmood
    UNIVERSITI KEBANGSAAN MALAYSIA TEACHING AND LEARNING CONGRESS 2011, VOL I, 2012, 59 : 685 - 691