Improving proximal policy optimization with alpha divergence

被引:3
|
作者
Xu, Haotian [1 ]
Yan, Zheng [1 ]
Xuan, Junyu [1 ]
Zhang, Guangquan [1 ]
Lu, Jie [1 ]
机构
[1] Univ Technol Sydney, Australia Artificial Intelligence Inst, Fac Engn & Informat Technol, Ultimo, Australia
基金
澳大利亚研究理事会;
关键词
Reinforcement learning; Deep neural networks; Proximal policy optimization; KL divergence; Alpha divergence; Markov decision process;
D O I
10.1016/j.neucom.2023.02.008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Proximal policy optimization (PPO) is a recent advancement in reinforcement learning, which is formu-lated as an unconstrained optimization problem including two terms: accumulative discount return and Kullback-Leibler (KL) divergence. Currently, there are three PPO versions: primary, adaptive, and clip-ping. The most widely used PPO algorithm is the clipping version, in which the KL divergence is replaced by a clipping function to measure the difference between two policies indirectly. In this paper, we revisit this primary PPO and improve it in two aspects. One is to reformulate it as a linearly combined form to control the trade-off between two terms. The other is to substitute a parametric alpha divergence for KL divergence to measure the difference of two policies more effectively. This novel PPO variant is referred to as alphaPPO in this paper. Experiments on six benchmark environments verify the effectiveness of our alphaPPO, compared with clipping and combined PPOs. CO 2023 Published by Elsevier B.V.
引用
收藏
页码:94 / 105
页数:12
相关论文
共 50 条
  • [41] Trust Region-Guided Proximal Policy Optimization
    Wang, Yuhui
    He, Hao
    Tan, Xiaoyang
    Gan, Yaozhong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [42] Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy
    Liu, Boyi
    Cai, Qi
    Yang, Zhuoran
    Wang, Zhaoran
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [43] Reactive Power Optimization Based on Proximal Policy Optimization of Deep Reinforcement Learning
    Zahng P.
    Zhu Z.
    Xie H.
    Dianwang Jishu/Power System Technology, 2023, 47 (02): : 562 - 570
  • [44] Proximal policy optimization-based join order optimization with spark SQL
    Lee K.-M.
    Kim I.
    Lee K.-C.
    Lee, Kyu-Chul (kclee@cnu.ac.kr), 1600, Institute of Electronics Engineers of Korea (10): : 227 - 232
  • [45] Proximal policy optimization-based controller for chaotic systems
    Yau, Her-Terng
    Kuo, Ping-Huan
    Luan, Po-Chien
    Tseng, Yung-Ruen
    INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2024, 34 (01) : 586 - 601
  • [46] Robust solar sail trajectories using proximal policy optimization
    Bianchi, Christian
    Niccolai, Lorenzo
    Mengali, Giovanni
    ACTA ASTRONAUTICA, 2025, 226 : 702 - 715
  • [47] Mixed-Autonomy Traffic Control with Proximal Policy Optimization
    Wei, Haoran
    Liu, Xuanzhang
    Mashayekhy, Lena
    Decker, Keith
    2019 IEEE VEHICULAR NETWORKING CONFERENCE (VNC), 2019,
  • [48] A Proximal Policy Optimization method in UAV swarm formation control
    Yu, Ning
    Juan, Feng
    Zhao, Hongwei
    ALEXANDRIA ENGINEERING JOURNAL, 2024, 100 : 268 - 276
  • [49] On Proximal Policy Optimization's Heavy-tailed Gradients
    Garg, Saurabh
    Zhanson, Joshua
    Parisotto, Emilio
    Prasad, Adarsh
    Kolter, J. Zico
    Lipton, Zachary C.
    Balakrishnan, Sivaraman
    Salakhutdinov, Ruslan
    Ravikumar, Pradeep
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [50] Automatic Management of Cloud Applications with Use of Proximal Policy Optimization
    Funika, Wlodzimierz
    Koperek, Pawel
    Kitowski, Jacek
    COMPUTATIONAL SCIENCE - ICCS 2020, PT I, 2020, 12137 : 73 - 87