Policy Teaching Through Reward Function Learning

被引：0

作者：

Zhang, Haoqi ^{[1
]}

Parkes, David C. ^{[1
]}

Chen, Yiling ^{[1
]}

机构：

[1] Harvard Univ, Sch Engn & Appl Sci, Cambridge, MA 02138 USA

来源：

10TH ACM CONFERENCE ON ELECTRONIC COMMERCE - EC 2009 | 2009年

关键词：

D O I：

暂无

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Policy teaching considers a Markov Decision Process setting in which an interested party aims to influence an agent's decisions by providing limited incentives. In this paper, we consider the specific objective of inducing a pre-specilied desired policy. We examine both the case in which the agent's reward function is known and unknown to the interested party, presenting a linear program for the former case and formulating an active, indirect elicitation method for the latter. We provide conditions for logarithmic convergence, and present a polynomial time algorithm that ensures logarithmic convergence with arbitrarily high probability. We also offer practical elicitation heuristics that can be formulated as linear programs, and demonstrate their effectiveness on a policy teaching problem in a simulated ad network setting. We extend our methods to handle partial observations and partial target policies, and provide a game-theoretic interpretation of our methods for handling strategic agents.

引用

页码：295 / 304

页数：10

共 50 条

[1] Admissible Policy Teaching through Reward Design
Banihashem, Kiarash
Singla, Adish
Gan, Jiarui
Radanovic, Goran
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 6037 - 6045
[2] Reinforcement Learning With Constrained Uncertain Reward Function Through Particle Filtering
Dogru, Oguzhan
Chiplunkar, Ranjith
Huang, Biao
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2022, 69 (07) : 7491 - 7499
[3] Teaching Public Policy Advocacy Skills Through Experiential Learning
Hawkins, Janice
Tremblay, Beth
NURSE EDUCATOR, 2021, 46 (04) : E54 - E54
[4] PEDAGOGIC WORKSHOP IN THE FUNCTION OF ACTIVE TEACHING AND LEARNING THROUGH SUCCESS
Buljubasic-Kuzmanovic, Vesna
METODICKI OGLEDI-METHODICAL REVIEW, 2006, 13 (01): : 123 - 136
[5] Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning
Icarte R.T.
Klassen T.Q.
Valenzano R.
McIlraith S.A.
Journal of Artificial Intelligence Research, 2022, 73 : 173 - 208
[6] Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning
Icarte, Rodrigo Toro
Klassen, Toryn Q.
Valenzano, Richard
Mcllraith, Sheila A.
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2022, 73 : 173 - 208
[7] Reward Certification for Policy Smoothed Reinforcement Learning
Mu, Ronghui
Marcolino, Leandro Soriano
Zhang, Yanghao
Zhang, Tianle
Huang, Xiaowei
Ruan, Wenjie
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 19, 2024, : 21429 - 21437
[8] Reward Function Learning for Dialogue Management
El Asri, Layla
Laroche, Romain
Pietquin, Olivier
PROCEEDINGS OF THE SIXTH STARTING AI RESEARCHERS' SYMPOSIUM (STAIRS 2012), 2012, 241 : 95 - +
[9] Pitfalls of Learning a Reward Function Online
Armstrong, Stuart
Jan Leike
Orseau, Laurent
Legg, Shane
PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 1592 - 1600
[10] LEARNING IN HONEYBEES AS A FUNCTION OF REWARD PROBABILITY
COUVILLON, PA
FISCHER, ME
BITTERMAN, ME
BULLETIN OF THE PSYCHONOMIC SOCIETY, 1992, 30 (06) : 445 - 445

← 1 2 3 4 5 →