Policy Teaching Through Reward Function Learning

被引：0

作者：

Zhang, Haoqi ^{[1
]}

Parkes, David C. ^{[1
]}

Chen, Yiling ^{[1
]}

机构：

[1] Harvard Univ, Sch Engn & Appl Sci, Cambridge, MA 02138 USA

来源：

10TH ACM CONFERENCE ON ELECTRONIC COMMERCE - EC 2009 | 2009年

关键词：

D O I：

暂无

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Policy teaching considers a Markov Decision Process setting in which an interested party aims to influence an agent's decisions by providing limited incentives. In this paper, we consider the specific objective of inducing a pre-specilied desired policy. We examine both the case in which the agent's reward function is known and unknown to the interested party, presenting a linear program for the former case and formulating an active, indirect elicitation method for the latter. We provide conditions for logarithmic convergence, and present a polynomial time algorithm that ensures logarithmic convergence with arbitrarily high probability. We also offer practical elicitation heuristics that can be formulated as linear programs, and demonstrate their effectiveness on a policy teaching problem in a simulated ad network setting. We extend our methods to handle partial observations and partial target policies, and provide a game-theoretic interpretation of our methods for handling strategic agents.

引用

页码：295 / 304

页数：10

共 50 条

[21] Dynamic Adjustment of Reward Function for Proximal Policy Optimization with Imitation Learning: Application to Automated Parking Systems
Albilani, Mohamad
Bouzeghoub, Amel
2022 IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV), 2022, : 1400 - 1408
[22] Evolution of an Internal Reward Function for Reinforcement Learning
Zuo, Weiyi
Pedersen, Joachim Winther
Risi, Sebastian
PROCEEDINGS OF THE 2023 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE COMPANION, GECCO 2023 COMPANION, 2023, : 351 - 354
[23] CHILDRENS DISCRIMINATION LEARNING AS A FUNCTION OF REWARD AND PUNISHMENT
PENNEY, RK
LUPTON, AA
JOURNAL OF COMPARATIVE AND PHYSIOLOGICAL PSYCHOLOGY, 1961, 54 (04): : 449 - &
[24] Active reward learning with a novel acquisition function
Christian Daniel
Oliver Kroemer
Malte Viering
Jan Metz
Jan Peters
Autonomous Robots, 2015, 39 : 389 - 405
[25] LEARNING IN HONEYBEES AS A FUNCTION OF AMOUNT AND FREQUENCY OF REWARD
BUCHANAN, GM
BITTERMAN, ME
ANIMAL LEARNING & BEHAVIOR, 1988, 16 (03): : 247 - 255
[26] A Humanoid Robot Standing Up Through Learning from Demonstration Using a Multimodal Reward Function
Gonzalez-Fierro, Miguel
Balaguer, Carlos
Swann, Nicola
Nanayakkara, Thrishantha
2013 13TH IEEE-RAS INTERNATIONAL CONFERENCE ON HUMANOID ROBOTS (HUMANOIDS), 2013, : 74 - 79
[27] Average-Reward Off-Policy Policy Evaluation with Function Approximation
Zhang, Shangtong
Wan, Yi
Sutton, Richard S.
Whiteson, Shimon
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[28] Teaching family policy through a policy practice framework
Rocha, CJ
Johnson, AK
JOURNAL OF SOCIAL WORK EDUCATION, 1997, 33 (03) : 433 - 444
[29] Learning reward timing in cortex through reward dependent expression of synaptic plasticity
Gavornik, Jeffrey P.
Shuler, Marshall G. Hussain
Loewenstein, Yonatan
Bear, Mark F.
Shouval, Harel Z.
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2009, 106 (16) : 6826 - 6831
[30] Transformation of language in teaching and learning policy
Ahmad, Rokiah Rozita
Majid, Noriza
Mamat, Nur Jumaadzan Zaleha
Rambely, Azmin Sham
Muda, Nora
Jaaman, Saiful Hafizah Hj
Suradi, Nur Riza Mohd
Ismail, Wan Rosmanira
Shahabuddin, Faridatulazna Ahmad
Nazar, Roslinda Mohd
Samsudin, Humaida Banu
Zin, Wan Zawiah Wan
Zahari, Marina
Rafee, Najib Mahmood
UNIVERSITI KEBANGSAAN MALAYSIA TEACHING AND LEARNING CONGRESS 2011, VOL I, 2012, 59 : 685 - 691

← 1 2 3 4 5 →