Proximal Policy Optimization with Entropy Regularization

被引：0

作者：

Shen, Yuqing ^{[1
]}

机构：

[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA

来源：

2024 4TH INTERNATIONAL CONFERENCE ON COMPUTER, CONTROL AND ROBOTICS, ICCCR 2024 | 2024年

关键词：

reinforcement learning; policy gradient; entropy regularization;

D O I：

10.1109/ICCCR61138.2024.10585473

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This study provides a revision to the Proximal Policy Optimization (PPO) algorithm, primarily aimed at improving the stability of PPO during the training process while maintaining a balance between exploration and exploitation. Recognizing the inherent challenge of achieving this balance in a complex environment, the proposed method adopts an entropy regularization technique similar to the one used in the Asynchronous Advantage Actor-Critic (A3C) algorithm. The main purpose of this design is to encourage exploration in the early stages, preventing the agent from prematurely converging to a sub-optimal policy. Detailed theoretical explanations of how the entropy term improves the robustness of the learning trajectory will be provided. Experimental results demonstrate that the revised PPO not only maintains the original strengths of the PPO algorithm, but also shows significant improvement in the stability of the training process. This work contributes to the ongoing research in reinforcement learning and offers a promising direction for future research on the adoption of PPO in environments with complicated dynamics.

引用

页码：380 / 383

页数：4

共 50 条

[21] Stable Policy Optimization via Off-Policy Divergence Regularization
Touati, Ahmed
Zhang, Amy
Pineau, Joelle
Vincent, Pascal
CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE (UAI 2020), 2020, 124 : 1328 - 1337
[22] Linear Convergence of Independent Natural Policy Gradient in Games With Entropy Regularization
Sun, Youbang
Liu, Tao
Kumar, P. R.
Shahrampour, Shahin
IEEE CONTROL SYSTEMS LETTERS, 2024, 8 : 1217 - 1222
[23] Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization
Cen, Shicong
Cheng, Chen
Chen, Yuxin
Wei, Yuting
Chi, Yuejie
OPERATIONS RESEARCH, 2021, 70 (04) : 2563 - 2578
[24] An AGC Dynamic Optimization Method Based on Proximal Policy Optimization
Liu, Zhao
Li, Jiateng
Zhang, Pei
Ding, Zhenhuan
Zhao, Yanshun
FRONTIERS IN ENERGY RESEARCH, 2022, 10
[25] Proximal Policy Optimization with Relative Pearson Divergence
Kobayashi, Taisuke
2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 8416 - 8421
[26] Anti-Martingale Proximal Policy Optimization
Gu, Yang
Cheng, Yuhu
Yu, Kun
Wang, Xuesong
IEEE TRANSACTIONS ON CYBERNETICS, 2023, 53 (10) : 6421 - 6432
[27] Proximal Policy Optimization with Mixed Distributed Training
Zhang, Zhenyu
Luo, Xiangfeng
Liu, Tong
Xie, Shaorong
Wang, Jianshu
Wang, Wei
Li, Yang
Peng, Yan
2019 IEEE 31ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2019), 2019, : 1452 - 1456
[28] Improving proximal policy optimization with alpha divergence
Xu, Haotian
Yan, Zheng
Xuan, Junyu
Zhang, Guangquan
Lu, Jie
NEUROCOMPUTING, 2023, 534 : 94 - 105
[29] Partial Advantage Estimator for Proximal Policy Optimization
Jin, Yizhao
Song, Xiulei
Slabaugh, Gregory
Lucas, Simon
IEEE TRANSACTIONS ON GAMES, 2025, 17 (01) : 158 - 166
[30] Image captioning via proximal policy optimization
Zhang, Le
Zhang, Yanshuo
Zhao, Xin
Zou, Zexiao
IMAGE AND VISION COMPUTING, 2021, 108

← 1 2 3 4 5 →