Improving proximal policy optimization with alpha divergence

被引:3
|
作者
Xu, Haotian [1 ]
Yan, Zheng [1 ]
Xuan, Junyu [1 ]
Zhang, Guangquan [1 ]
Lu, Jie [1 ]
机构
[1] Univ Technol Sydney, Australia Artificial Intelligence Inst, Fac Engn & Informat Technol, Ultimo, Australia
基金
澳大利亚研究理事会;
关键词
Reinforcement learning; Deep neural networks; Proximal policy optimization; KL divergence; Alpha divergence; Markov decision process;
D O I
10.1016/j.neucom.2023.02.008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Proximal policy optimization (PPO) is a recent advancement in reinforcement learning, which is formu-lated as an unconstrained optimization problem including two terms: accumulative discount return and Kullback-Leibler (KL) divergence. Currently, there are three PPO versions: primary, adaptive, and clip-ping. The most widely used PPO algorithm is the clipping version, in which the KL divergence is replaced by a clipping function to measure the difference between two policies indirectly. In this paper, we revisit this primary PPO and improve it in two aspects. One is to reformulate it as a linearly combined form to control the trade-off between two terms. The other is to substitute a parametric alpha divergence for KL divergence to measure the difference of two policies more effectively. This novel PPO variant is referred to as alphaPPO in this paper. Experiments on six benchmark environments verify the effectiveness of our alphaPPO, compared with clipping and combined PPOs. CO 2023 Published by Elsevier B.V.
引用
收藏
页码:94 / 105
页数:12
相关论文
共 50 条
  • [31] Proximal policy optimization with an integral compensator for quadrotor control
    Hu, Huan
    Wang, Qing-ling
    FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2020, 21 (05) : 777 - 795
  • [32] Augmented Proximal Policy Optimization for Safe Reinforcement Learning
    Dai, Juntao
    Ji, Jiaming
    Yang, Long
    Zheng, Qian
    Pan, Gang
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 6, 2023, : 7288 - 7295
  • [33] A Novel Proximal Policy Optimization Approach for Filter Design
    Fan, Dongdong
    Ding, Shuai
    Zhang, Haotian
    Zhang, Weihao
    Jia, Qingsong
    Han, Xu
    Tang, Hao
    Zhu, Zhaojun
    Zhou, Yuliang
    APPLIED COMPUTATIONAL ELECTROMAGNETICS SOCIETY JOURNAL, 2024, 39 (05): : 390 - 395
  • [34] Proximal policy optimization with model-based methods
    Li, Shuailong
    Zhang, Wei
    Zhang, Huiwen
    Zhang, Xin
    Leng, Yuquan
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 42 (06) : 5399 - 5410
  • [35] A novel guidance law based on proximal policy optimization
    Jiang, Yang
    Yu, Jianglong
    Li, Qingdong
    Ren, Zhang
    Done, Xiwang
    Hua, Yongzhao
    2022 41ST CHINESE CONTROL CONFERENCE (CCC), 2022, : 3364 - 3369
  • [36] Proximal policy optimization via enhanced exploration efficiency
    Zhang, Junwei
    Zhang, Zhenghao
    Han, Shuai
    Lue, Shuai
    INFORMATION SCIENCES, 2022, 609 : 750 - 765
  • [37] Use of Proximal Policy Optimization for the Joint Replenishment Problem
    Vanvuchelen, Nathalie
    Gijsbrechts, Joren
    Boute, Robert
    COMPUTERS IN INDUSTRY, 2020, 119
  • [38] DNA: Proximal Policy Optimization with a Dual Network Architecture
    Aitchison, Matthew
    Sweetser, Penny
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [39] Proximal policy optimization with an integral compensator for quadrotor control
    Huan Hu
    Qing-ling Wang
    Frontiers of Information Technology & Electronic Engineering, 2020, 21 : 777 - 795
  • [40] Misleading Inference Generation via Proximal Policy Optimization
    Peng, Hsien-Yung
    Chung, Ho-Lam
    Chan, Ying-Hong
    Fan, Yao-Chung
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2022, PT I, 2022, 13280 : 497 - 509