Generalized Maximum Entropy Reinforcement Learning via Reward Shaping

被引:2
|
作者
Tao F. [1 ]
Wu M. [2 ]
Cao Y. [2 ]
机构
[1] Volvo Car Technology Usa Llc, Sunnyvale, 94085, CA
[2] University of Texas, Department of Electrical Engineering, San Antonio, 78249, TX
来源
关键词
Entropy; reinforcement learning (RL); reward-shaping;
D O I
10.1109/TAI.2023.3297988
中图分类号
学科分类号
摘要
Entropy regularization is a commonly used technique in reinforcement learning to improve exploration and cultivate a better pre-trained policy for later adaptation. Recent studies further show that the use of entropy regularization can smooth the optimization landscape and simplify the policy optimization process, indicating the value of integrating entropy into reinforcement learning. However, existing studies only consider the policy's entropy at the current state as an extra regularization term in the policy gradient or in the objective function without formally integrating the entropy in the reward function. In this article, we propose a shaped reward that includes the agent's policy entropy into the reward function. In particular, the agent's expected entropy over a distribution of the next state is added to the immediate reward associated with the current state. The addition of the agent's expected policy entropy at the next state distribution is shown to yield new soft Q-function and state function that are concise and modular. Moreover, the new reinforcement learning framework can be easily applied to the existing standard reinforcement learning algorithms, such as deep q-network (DQN) and proximal policy optimization (PPO), while inheriting the benefits of employing entropy regularization. We further present a soft stochastic policy gradient theorem based on the shaped reward and propose a new practical reinforcement learning algorithm. Finally, a few experimental studies are conducted in MuJoCo environment to demonstrate that our method can outperform an existing state-of-the-art off-policy maximum entropy reinforcement learning approach soft actor-critic by 5%-150% in terms of average return. © 2020 IEEE.
引用
收藏
页码:1563 / 1572
页数:9
相关论文
共 50 条
  • [1] Adaptively Shaping Reinforcement Learning Agents via Human Reward
    Yu, Chao
    Wang, Dongxu
    Yang, Tianpei
    Zhu, Wenxuan
    Li, Yuchen
    Ge, Hongwei
    Ren, Jiankang
    PRICAI 2018: TRENDS IN ARTIFICIAL INTELLIGENCE, PT I, 2018, 11012 : 85 - 97
  • [2] Guaranteeing Control Requirements via Reward Shaping in Reinforcement Learning
    De Lellis, Francesco
    Coraggio, Marco
    Russo, Giovanni
    Musolesi, Mirco
    di Bernardo, Mario
    IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, 2024, 32 (06) : 2102 - 2113
  • [3] Belief Reward Shaping in Reinforcement Learning
    Marom, Ofir
    Rosman, Benjamin
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 3762 - 3769
  • [4] Reward Shaping in Episodic Reinforcement Learning
    Grzes, Marek
    AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2017, : 565 - 573
  • [5] Multigrid Reinforcement Learning with Reward Shaping
    Grzes, Marek
    Kudenko, Daniel
    ARTIFICIAL NEURAL NETWORKS - ICANN 2008, PT I, 2008, 5163 : 357 - 366
  • [6] Principled reward shaping for reinforcement learning via lyapunov stability theory
    Dong, Yunlong
    Tang, Xiuchuan
    Yuan, Ye
    NEUROCOMPUTING, 2020, 393 : 83 - 90
  • [7] Utilizing Prior Solutions for Reward Shaping and Composition in Entropy-Regularized Reinforcement Learning
    Adamczyk, Jacob
    Arriojas, Argenis
    Tiomkin, Stas
    Kulkarni, Rahul V.
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 6, 2023, : 6658 - 6665
  • [8] Reward Shaping for Reinforcement Learning by Emotion Expressions
    Hwang, K. S.
    Ling, J. L.
    Chen, Yu-Ying
    Wang, Wei-Han
    2014 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), 2014, : 1288 - 1293
  • [9] Hindsight Reward Shaping in Deep Reinforcement Learning
    de Villiers, Byron
    Sabatta, Deon
    2020 INTERNATIONAL SAUPEC/ROBMECH/PRASA CONFERENCE, 2020, : 653 - 659
  • [10] Reward Shaping Based Federated Reinforcement Learning
    Hu, Yiqiu
    Hua, Yun
    Liu, Wenyan
    Zhu, Jun
    IEEE ACCESS, 2021, 9 : 67259 - 67267