Distributional Reward Decomposition for Reinforcement Learning

被引:0
|
作者
Lin, Zichuan [1 ,2 ]
Zhao, Li [2 ]
Yang, Derek [3 ]
Qin, Tao [2 ]
Yang, Guangwen [1 ]
Liu, Tie-Yan [2 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] Microsoft Res, Redmond, WA USA
[3] Univ Calif San Diego, La Jolla, CA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many reinforcement learning (RL) tasks have specific properties that can be lever-aged to modify existing RL algorithms to adapt to those tasks and further improve performance, and a general class of such properties is the multiple reward channel. In those environments the full reward can be decomposed into sub-rewards obtained from different channels. Existing work on reward decomposition either requires prior knowledge of the environment to decompose the full reward, or decomposes reward without prior knowledge but with degraded performance. In this paper, we propose Distributional Reward Decomposition for Reinforcement Learning (DRDRL), a novel reward decomposition algorithm which captures the multiple reward channel structure under distributional setting. Empirically, our method captures the multi-channel structure and discovers meaningful reward decomposition, without any requirements on prior knowledge. Consequently, our agent achieves better performance than existing methods on environments with multiple reward channels.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Reinforcement Learning with a Corrupted Reward Channel
    Everitt, Tom
    Krakovna, Victoria
    Orseau, Laurent
    Legg, Shane
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4705 - 4713
  • [32] Multigrid Reinforcement Learning with Reward Shaping
    Grzes, Marek
    Kudenko, Daniel
    ARTIFICIAL NEURAL NETWORKS - ICANN 2008, PT I, 2008, 5163 : 357 - 366
  • [33] Reward Shaping in Episodic Reinforcement Learning
    Grzes, Marek
    AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2017, : 565 - 573
  • [34] Hybrid Reward Architecture for Reinforcement Learning
    van Seijen, Harm
    Fatemi, Mehdi
    Romoff, Joshua
    Laroche, Romain
    Barnes, Tavian
    Tsang, Jeffrey
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [35] Hierarchical average reward reinforcement learning
    Ghavamzadeh, Mohammad
    Mahadevan, Sridhar
    JOURNAL OF MACHINE LEARNING RESEARCH, 2007, 8 : 2629 - 2669
  • [36] Reinforcement Learning with Stochastic Reward Machines
    Corazza, Jan
    Gavran, Ivan
    Neider, Daniel
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 6429 - 6436
  • [37] Reward Identification in Inverse Reinforcement Learning
    Kim, Kuno
    Garg, Shivam
    Shiragur, Kirankumar
    Ermon, Stefano
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [38] Compatible Reward Inverse Reinforcement Learning
    Metelli, Alberto Maria
    Pirotta, Matteo
    Restelli, Marcello
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [39] Hierarchical average reward reinforcement learning
    Department of Computing Science, University of Alberta, Edmonton, Alta. T6G 2E8, Canada
    不详
    Journal of Machine Learning Research, 2007, 8 : 2629 - 2669
  • [40] Reward learning: Reinforcement, incentives, and expectations
    Berridge, KC
    PSYCHOLOGY OF LEARNING AND MOTIVATION: ADVANCES IN RESEARCH AND THEORY, VOL 40, 2001, 40 : 223 - 278