Generalized Maximum Entropy Reinforcement Learning via Reward Shaping

被引:2
|
作者
Tao F. [1 ]
Wu M. [2 ]
Cao Y. [2 ]
机构
[1] Volvo Car Technology Usa Llc, Sunnyvale, 94085, CA
[2] University of Texas, Department of Electrical Engineering, San Antonio, 78249, TX
来源
关键词
Entropy; reinforcement learning (RL); reward-shaping;
D O I
10.1109/TAI.2023.3297988
中图分类号
学科分类号
摘要
Entropy regularization is a commonly used technique in reinforcement learning to improve exploration and cultivate a better pre-trained policy for later adaptation. Recent studies further show that the use of entropy regularization can smooth the optimization landscape and simplify the policy optimization process, indicating the value of integrating entropy into reinforcement learning. However, existing studies only consider the policy's entropy at the current state as an extra regularization term in the policy gradient or in the objective function without formally integrating the entropy in the reward function. In this article, we propose a shaped reward that includes the agent's policy entropy into the reward function. In particular, the agent's expected entropy over a distribution of the next state is added to the immediate reward associated with the current state. The addition of the agent's expected policy entropy at the next state distribution is shown to yield new soft Q-function and state function that are concise and modular. Moreover, the new reinforcement learning framework can be easily applied to the existing standard reinforcement learning algorithms, such as deep q-network (DQN) and proximal policy optimization (PPO), while inheriting the benefits of employing entropy regularization. We further present a soft stochastic policy gradient theorem based on the shaped reward and propose a new practical reinforcement learning algorithm. Finally, a few experimental studies are conducted in MuJoCo environment to demonstrate that our method can outperform an existing state-of-the-art off-policy maximum entropy reinforcement learning approach soft actor-critic by 5%-150% in terms of average return. © 2020 IEEE.
引用
收藏
页码:1563 / 1572
页数:9
相关论文
共 50 条
  • [21] Maximum Entropy Reinforcement Learning with Evolution Strategies
    Shi, Longxiang
    Li, Shijian
    Zheng, Qian
    Cao, Longbing
    Yang, Long
    Pan, Gang
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [22] Generalization in Deep Reinforcement Learning for Robotic Navigation by Reward Shaping
    Miranda, Victor R. F.
    Neto, Armando A.
    Freitas, Gustavo M.
    Mozelli, Leonardo A.
    IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2024, 71 (06) : 6013 - 6020
  • [23] Reward Shaping from Hybrid Systems Models in Reinforcement Learning
    Qian, Marian
    Mitsch, Stefan
    NASA FORMAL METHODS, NFM 2023, 2023, 13903 : 122 - 139
  • [24] Obstacle Avoidance and Navigation Utilizing Reinforcement Learning with Reward Shaping
    Zhang, Daniel
    Bailey, Colleen P.
    ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR MULTI-DOMAIN OPERATIONS APPLICATIONS II, 2020, 11413
  • [25] Multi-Objectivization of Reinforcement Learning Problems by Reward Shaping
    Brys, Tim
    Harutyunyan, Anna
    Vrancx, Peter
    Taylor, Matthew E.
    Kudenko, Daniel
    Nowe, Ann
    PROCEEDINGS OF THE 2014 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2014, : 2315 - 2322
  • [26] Landmark Based Reward Shaping in Reinforcement Learning with Hidden States
    Demir, Alper
    Cilden, Erkin
    Polat, Faruk
    AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 1922 - 1924
  • [27] Reward Shaping for Model-Based Bayesian Reinforcement Learning
    Kim, Hyeoneun
    Lim, Woosang
    Lee, Kanghoon
    Noh, Yung-Kyun
    Kim, Kee-Eung
    PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 3548 - 3555
  • [28] Graph convolutional recurrent networks for reward shaping in reinforcement learning
    Sami, Hani
    Bentahar, Jamal
    Mourad, Azzam
    Otrok, Hadi
    Damiani, Ernesto
    INFORMATION SCIENCES, 2022, 608 : 63 - 80
  • [29] Optimizing Reinforcement Learning Agents in Games Using Curriculum Learning and Reward Shaping
    Khan, Adil
    Muhammad, Muhammad
    Naeem, Muhammad
    COMPUTER ANIMATION AND VIRTUAL WORLDS, 2025, 36 (01)
  • [30] Releasing source locating based on Multi-Agent Reinforcement Learning with reward function designed by maximum entropy
    Wang, Zhi-Pu
    Zeng, Guang-Rong
    Deng, Lie-Wei
    Cao, Wang
    Guo, Yao
    2022 41ST CHINESE CONTROL CONFERENCE (CCC), 2022, : 4688 - 4693