Proximal Policy Optimization with Entropy Regularization

被引:0
|
作者
Shen, Yuqing [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
关键词
reinforcement learning; policy gradient; entropy regularization;
D O I
10.1109/ICCCR61138.2024.10585473
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This study provides a revision to the Proximal Policy Optimization (PPO) algorithm, primarily aimed at improving the stability of PPO during the training process while maintaining a balance between exploration and exploitation. Recognizing the inherent challenge of achieving this balance in a complex environment, the proposed method adopts an entropy regularization technique similar to the one used in the Asynchronous Advantage Actor-Critic (A3C) algorithm. The main purpose of this design is to encourage exploration in the early stages, preventing the agent from prematurely converging to a sub-optimal policy. Detailed theoretical explanations of how the entropy term improves the robustness of the learning trajectory will be provided. Experimental results demonstrate that the revised PPO not only maintains the original strengths of the PPO algorithm, but also shows significant improvement in the stability of the training process. This work contributes to the ongoing research in reinforcement learning and offers a promising direction for future research on the adoption of PPO in environments with complicated dynamics.
引用
收藏
页码:380 / 383
页数:4
相关论文
共 50 条
  • [21] Stable Policy Optimization via Off-Policy Divergence Regularization
    Touati, Ahmed
    Zhang, Amy
    Pineau, Joelle
    Vincent, Pascal
    CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE (UAI 2020), 2020, 124 : 1328 - 1337
  • [22] Linear Convergence of Independent Natural Policy Gradient in Games With Entropy Regularization
    Sun, Youbang
    Liu, Tao
    Kumar, P. R.
    Shahrampour, Shahin
    IEEE CONTROL SYSTEMS LETTERS, 2024, 8 : 1217 - 1222
  • [23] Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization
    Cen, Shicong
    Cheng, Chen
    Chen, Yuxin
    Wei, Yuting
    Chi, Yuejie
    OPERATIONS RESEARCH, 2021, 70 (04) : 2563 - 2578
  • [24] An AGC Dynamic Optimization Method Based on Proximal Policy Optimization
    Liu, Zhao
    Li, Jiateng
    Zhang, Pei
    Ding, Zhenhuan
    Zhao, Yanshun
    FRONTIERS IN ENERGY RESEARCH, 2022, 10
  • [25] Proximal Policy Optimization with Relative Pearson Divergence
    Kobayashi, Taisuke
    2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 8416 - 8421
  • [26] Anti-Martingale Proximal Policy Optimization
    Gu, Yang
    Cheng, Yuhu
    Yu, Kun
    Wang, Xuesong
    IEEE TRANSACTIONS ON CYBERNETICS, 2023, 53 (10) : 6421 - 6432
  • [27] Proximal Policy Optimization with Mixed Distributed Training
    Zhang, Zhenyu
    Luo, Xiangfeng
    Liu, Tong
    Xie, Shaorong
    Wang, Jianshu
    Wang, Wei
    Li, Yang
    Peng, Yan
    2019 IEEE 31ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2019), 2019, : 1452 - 1456
  • [28] Improving proximal policy optimization with alpha divergence
    Xu, Haotian
    Yan, Zheng
    Xuan, Junyu
    Zhang, Guangquan
    Lu, Jie
    NEUROCOMPUTING, 2023, 534 : 94 - 105
  • [29] Partial Advantage Estimator for Proximal Policy Optimization
    Jin, Yizhao
    Song, Xiulei
    Slabaugh, Gregory
    Lucas, Simon
    IEEE TRANSACTIONS ON GAMES, 2025, 17 (01) : 158 - 166
  • [30] Image captioning via proximal policy optimization
    Zhang, Le
    Zhang, Yanshuo
    Zhao, Xin
    Zou, Zexiao
    IMAGE AND VISION COMPUTING, 2021, 108