Distributional Reward Decomposition for Reinforcement Learning

被引：0

作者：

Lin, Zichuan ^{[1
,2
]}

Zhao, Li ^{[2
]}

Yang, Derek ^{[3
]}

Qin, Tao ^{[2
]}

Yang, Guangwen ^{[1
]}

Liu, Tie-Yan ^{[2
]}

机构：

[1] Tsinghua Univ, Beijing, Peoples R China

[2] Microsoft Res, Redmond, WA USA

[3] Univ Calif San Diego, La Jolla, CA USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019) | 2019年 / 32卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Many reinforcement learning (RL) tasks have specific properties that can be lever-aged to modify existing RL algorithms to adapt to those tasks and further improve performance, and a general class of such properties is the multiple reward channel. In those environments the full reward can be decomposed into sub-rewards obtained from different channels. Existing work on reward decomposition either requires prior knowledge of the environment to decompose the full reward, or decomposes reward without prior knowledge but with degraded performance. In this paper, we propose Distributional Reward Decomposition for Reinforcement Learning (DRDRL), a novel reward decomposition algorithm which captures the multiple reward channel structure under distributional setting. Empirically, our method captures the multi-channel structure and discovers meaningful reward decomposition, without any requirements on prior knowledge. Consequently, our agent achieves better performance than existing methods on environments with multiple reward channels.

引用

页数：10

共 50 条

[31] Reinforcement Learning with a Corrupted Reward Channel
Everitt, Tom
Krakovna, Victoria
Orseau, Laurent
Legg, Shane
PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4705 - 4713
[32] Multigrid Reinforcement Learning with Reward Shaping
Grzes, Marek
Kudenko, Daniel
ARTIFICIAL NEURAL NETWORKS - ICANN 2008, PT I, 2008, 5163 : 357 - 366
[33] Reward Shaping in Episodic Reinforcement Learning
Grzes, Marek
AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2017, : 565 - 573
[34] Hybrid Reward Architecture for Reinforcement Learning
van Seijen, Harm
Fatemi, Mehdi
Romoff, Joshua
Laroche, Romain
Barnes, Tavian
Tsang, Jeffrey
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[35] Hierarchical average reward reinforcement learning
Ghavamzadeh, Mohammad
Mahadevan, Sridhar
JOURNAL OF MACHINE LEARNING RESEARCH, 2007, 8 : 2629 - 2669
[36] Reinforcement Learning with Stochastic Reward Machines
Corazza, Jan
Gavran, Ivan
Neider, Daniel
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 6429 - 6436
[37] Reward Identification in Inverse Reinforcement Learning
Kim, Kuno
Garg, Shivam
Shiragur, Kirankumar
Ermon, Stefano
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[38] Compatible Reward Inverse Reinforcement Learning
Metelli, Alberto Maria
Pirotta, Matteo
Restelli, Marcello
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[39] Hierarchical average reward reinforcement learning
Department of Computing Science, University of Alberta, Edmonton, Alta. T6G 2E8, Canada
不详
Journal of Machine Learning Research, 2007, 8 : 2629 - 2669
[40] Reward learning: Reinforcement, incentives, and expectations
Berridge, KC
PSYCHOLOGY OF LEARNING AND MOTIVATION: ADVANCES IN RESEARCH AND THEORY, VOL 40, 2001, 40 : 223 - 278

← 1 2 3 4 5 →