Improving proximal policy optimization with alpha divergence

被引:3
|
作者
Xu, Haotian [1 ]
Yan, Zheng [1 ]
Xuan, Junyu [1 ]
Zhang, Guangquan [1 ]
Lu, Jie [1 ]
机构
[1] Univ Technol Sydney, Australia Artificial Intelligence Inst, Fac Engn & Informat Technol, Ultimo, Australia
基金
澳大利亚研究理事会;
关键词
Reinforcement learning; Deep neural networks; Proximal policy optimization; KL divergence; Alpha divergence; Markov decision process;
D O I
10.1016/j.neucom.2023.02.008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Proximal policy optimization (PPO) is a recent advancement in reinforcement learning, which is formu-lated as an unconstrained optimization problem including two terms: accumulative discount return and Kullback-Leibler (KL) divergence. Currently, there are three PPO versions: primary, adaptive, and clip-ping. The most widely used PPO algorithm is the clipping version, in which the KL divergence is replaced by a clipping function to measure the difference between two policies indirectly. In this paper, we revisit this primary PPO and improve it in two aspects. One is to reformulate it as a linearly combined form to control the trade-off between two terms. The other is to substitute a parametric alpha divergence for KL divergence to measure the difference of two policies more effectively. This novel PPO variant is referred to as alphaPPO in this paper. Experiments on six benchmark environments verify the effectiveness of our alphaPPO, compared with clipping and combined PPOs. CO 2023 Published by Elsevier B.V.
引用
收藏
页码:94 / 105
页数:12
相关论文
共 50 条
  • [1] Proximal Policy Optimization with Relative Pearson Divergence
    Kobayashi, Taisuke
    2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 8416 - 8421
  • [2] Improving traffic signal control operations using proximal policy optimization
    Huang, Liben
    Qu, Xiaohui
    IET INTELLIGENT TRANSPORT SYSTEMS, 2023, 17 (03) : 588 - 601
  • [3] Proximal Policy Optimization With Policy Feedback
    Gu, Yang
    Cheng, Yuhu
    Chen, C. L. Philip
    Wang, Xuesong
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2022, 52 (07): : 4600 - 4610
  • [4] Improving Proximal Policy Optimization Algorithm in Interactive Multi-agent Systems
    Shang, Yi
    Chen, Yifei
    Cruz, Francisco
    2024 IEEE INTERNATIONAL CONFERENCE ON DEVELOPMENT AND LEARNING, ICDL 2024, 2024,
  • [5] Coordinated Proximal Policy Optimization
    Wu, Zifan
    Yu, Chao
    Ye, Deheng
    Zhang, Junge
    Piao, Haiyin
    Zhuo, Hankz Hankui
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [6] Truly Proximal Policy Optimization
    Wang, Yuhui
    He, Hao
    Tan, Xiaoyang
    35TH UNCERTAINTY IN ARTIFICIAL INTELLIGENCE CONFERENCE (UAI 2019), 2020, 115 : 113 - 122
  • [7] Fast Proximal Policy Optimization
    Zhao, Weiqi
    Jiang, Haobo
    Xie, Jin
    PATTERN RECOGNITION, ACPR 2021, PT II, 2022, 13189 : 73 - 86
  • [8] Off-Policy Proximal Policy Optimization
    Meng, Wenjia
    Zheng, Qian
    Pan, Gang
    Yin, Yilong
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 8, 2023, : 9162 - 9170
  • [9] Divergence-Augmented Policy Optimization
    Wang, Qing
    Li, Yingru
    Xiong, Jiechao
    Zhang, Tong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [10] Proximal evolutionary strategy: improving deep reinforcement learning through evolutionary policy optimization
    Peng, Yiming
    Chen, Gang
    Zhang, Mengjie
    Xue, Bing
    MEMETIC COMPUTING, 2024, 16 (03) : 445 - 466