Improving proximal policy optimization with alpha divergence

被引:3
|
作者
Xu, Haotian [1 ]
Yan, Zheng [1 ]
Xuan, Junyu [1 ]
Zhang, Guangquan [1 ]
Lu, Jie [1 ]
机构
[1] Univ Technol Sydney, Australia Artificial Intelligence Inst, Fac Engn & Informat Technol, Ultimo, Australia
基金
澳大利亚研究理事会;
关键词
Reinforcement learning; Deep neural networks; Proximal policy optimization; KL divergence; Alpha divergence; Markov decision process;
D O I
10.1016/j.neucom.2023.02.008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Proximal policy optimization (PPO) is a recent advancement in reinforcement learning, which is formu-lated as an unconstrained optimization problem including two terms: accumulative discount return and Kullback-Leibler (KL) divergence. Currently, there are three PPO versions: primary, adaptive, and clip-ping. The most widely used PPO algorithm is the clipping version, in which the KL divergence is replaced by a clipping function to measure the difference between two policies indirectly. In this paper, we revisit this primary PPO and improve it in two aspects. One is to reformulate it as a linearly combined form to control the trade-off between two terms. The other is to substitute a parametric alpha divergence for KL divergence to measure the difference of two policies more effectively. This novel PPO variant is referred to as alphaPPO in this paper. Experiments on six benchmark environments verify the effectiveness of our alphaPPO, compared with clipping and combined PPOs. CO 2023 Published by Elsevier B.V.
引用
收藏
页码:94 / 105
页数:12
相关论文
共 50 条
  • [21] Generalized Proximal Policy Optimization with Sample Reuse
    Queeney, James
    Paschalidis, Ioannis Ch.
    Cassandras, Christos G.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [22] Proximal Policy Optimization with Advantage Reuse Competition
    Cheng Y.
    Guo Q.
    Wang X.
    IEEE Transactions on Artificial Intelligence, 2024, 5 (08): : 1 - 10
  • [23] Decaying Clipping Range in Proximal Policy Optimization
    Farsang, Monika
    Szegletes, Luca
    IEEE 15TH INTERNATIONAL SYMPOSIUM ON APPLIED COMPUTATIONAL INTELLIGENCE AND INFORMATICS (SACI 2021), 2021, : 521 - 525
  • [24] Proximal Policy Optimization for Radiation Source Search
    Proctor, Philippe
    Teuscher, Christof
    Hecht, Adam
    Osinski, Marek
    JOURNAL OF NUCLEAR ENGINEERING, 2021, 2 (04): : 368 - 397
  • [25] Learning Dialogue Policy Efficiently Through Dyna Proximal Policy Optimization
    Huang, Chenping
    Cao, Bin
    COLLABORATIVE COMPUTING: NETWORKING, APPLICATIONS AND WORKSHARING, COLLABORATECOM 2022, PT I, 2022, 460 : 396 - 414
  • [26] Pairs Trading Strategy Optimization Using Proximal Policy Optimization Algorithms
    Chen, Yi-Feng
    Shih, Wen-Yueh
    Lai, Hsu -Chao
    Chang, Hao-Chun
    Huang, Jiun-Long
    2023 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING, BIGCOMP, 2023, : 40 - 47
  • [27] Proximal Policy Optimization-Based Optimization of Microwave Planar Resonators
    Pan, Jia-Hao
    Liu, Qi Qiang
    Zhao, Wen-Sheng
    Hu, Xiaoping
    You, Bin
    Hu, Yue
    Wang, Jing
    Yu, Chenghao
    Wang, Da-Wei
    IEEE TRANSACTIONS ON COMPONENTS PACKAGING AND MANUFACTURING TECHNOLOGY, 2024, 14 (12): : 2339 - 2347
  • [28] HiPPO: Enhancing proximal policy optimization with highlight replay
    Zhang, Shutong
    Chen, Xing
    Liu, Zhaogeng
    Chen, Hechang
    Chang, Yi
    PATTERN RECOGNITION, 2025, 162
  • [29] Proximal policy optimization for formation navigation and obstacle avoidance
    Sadhukhan, Priyam
    Selmic, Rastko R.
    INTERNATIONAL JOURNAL OF INTELLIGENT ROBOTICS AND APPLICATIONS, 2022, 6 (04) : 746 - 759
  • [30] Proximal policy optimization for formation navigation and obstacle avoidance
    Priyam Sadhukhan
    Rastko R. Selmic
    International Journal of Intelligent Robotics and Applications, 2022, 6 : 746 - 759