Distributional Reward Decomposition for Reinforcement Learning

被引：0

作者：

Lin, Zichuan ^{[1
,2
]}

Zhao, Li ^{[2
]}

Yang, Derek ^{[3
]}

Qin, Tao ^{[2
]}

Yang, Guangwen ^{[1
]}

Liu, Tie-Yan ^{[2
]}

机构：

[1] Tsinghua Univ, Beijing, Peoples R China

[2] Microsoft Res, Redmond, WA USA

[3] Univ Calif San Diego, La Jolla, CA USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019) | 2019年 / 32卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Many reinforcement learning (RL) tasks have specific properties that can be lever-aged to modify existing RL algorithms to adapt to those tasks and further improve performance, and a general class of such properties is the multiple reward channel. In those environments the full reward can be decomposed into sub-rewards obtained from different channels. Existing work on reward decomposition either requires prior knowledge of the environment to decompose the full reward, or decomposes reward without prior knowledge but with degraded performance. In this paper, we propose Distributional Reward Decomposition for Reinforcement Learning (DRDRL), a novel reward decomposition algorithm which captures the multiple reward channel structure under distributional setting. Empirically, our method captures the multi-channel structure and discovers meaningful reward decomposition, without any requirements on prior knowledge. Consequently, our agent achieves better performance than existing methods on environments with multiple reward channels.

引用

页数：10

共 50 条

[41] Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning
Icarte R.T.
Klassen T.Q.
Valenzano R.
McIlraith S.A.
Journal of Artificial Intelligence Research, 2022, 73 : 173 - 208
[42] Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning
Icarte, Rodrigo Toro
Klassen, Toryn Q.
Valenzano, Richard
Mcllraith, Sheila A.
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2022, 73 : 173 - 208
[43] Biologically inspired reinforcement learning: Reward-based decomposition for multi-goal environments
Zhou, WD
Coggins, R
BIOLOGICALLY INSPIRED APPROACHES TO ADVANCED INFORMATION TECHNOLOGY, 2004, 3141 : 80 - 94
[44] Multi-Ship Dynamic Weapon-Target Assignment via Cooperative Distributional Reinforcement Learning With Dynamic Reward
Peng, Zhe
Lu, Zhifeng
Mao, Xiao
Ye, Feng
Huang, Kuihua
Wu, Guohua
Wang, Ling
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024,
[45] Actively learning costly reward functions for reinforcement learning
Eberhard, Andre
Metni, Houssam
Fahland, Georg
Stroh, Alexander
Friederich, Pascal
MACHINE LEARNING-SCIENCE AND TECHNOLOGY, 2024, 5 (01):
[46] Learning classifier system with average reward reinforcement learning
Zang, Zhaoxiang
Li, Dehua
Wang, Junying
Xia, Dan
KNOWLEDGE-BASED SYSTEMS, 2013, 40 : 58 - 71
[47] Reinforcement Learning for Data Preparation with Active Reward Learning
Berti-Equille, Laure
INTERNET SCIENCE, INSCI 2019, 2019, 11938 : 121 - 132
[48] Active Learning for Reward Estimation in Inverse Reinforcement Learning
Lopes, Manuel
Melo, Francisco
Montesano, Luis
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT II, 2009, 5782 : 31 - +
[49] Learning Reward Machines for Partially Observable Reinforcement Learning
Icarte, Rodrigo Toro
Waldie, Ethan
Klassen, Toryn Q.
Valenzano, Richard
Castro, Margarita P.
McIlraith, Sheila A.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[50] A Distributional Perspective on Multiagent Cooperation With Deep Reinforcement Learning
Huang, Liwei
Fu, Mingsheng
Rao, Ananya
Irissappane, Athirai A.
Zhang, Jie
Xu, Chengzhong
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (03) : 4246 - 4259

← 1 2 3 4 5 →