Comparative Analysis of A3C and PPO Algorithms in Reinforcement Learning: A Survey on General Environments

被引:3
|
作者
del Rio, Alberto [1 ]
Jimenez, David [2 ]
Serrano, Javier [3 ]
机构
[1] Univ Politecn Madrid, Escuela Tecn Super Ingn Telecomunicac ETSIT, Signals Syst & Radiocommun Dept, Madrid 28040, Spain
[2] Univ Politecn Madrid, Escuela Tecn Super Ingn Telecomunicac ETSIT, Phys Elect Elect Engn & Appl Phys Dept, Madrid 28040, Spain
[3] Univ Politecn Madrid, Escuela Tecn Super Ingn Sistemas Informat ETSISI, Informat Syst Dept, Madrid 28031, Spain
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Training; Stability analysis; Heuristic algorithms; Surveys; Moon; Space vehicles; Convergence; Prediction algorithms; Software algorithms; Reliability; Reinforcement learning; A3C; CartPole; comparison; environment complexity; Lunar Lander; performance analysis; PPO; reinforcement learning; sample efficiency; stability; STABILITY;
D O I
10.1109/ACCESS.2024.3472473
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This research article presents a comparison between two mainstream Deep Reinforcement Learning (DRL) algorithms, Asynchronous Advantage Actor-Critic (A3C) and Proximal Policy Optimization (PPO), in the context of two diverse environments: CartPole and Lunar Lander. DRL algorithms are widely known for their effectiveness in training agents to navigate complex environments and achieve optimal policies. Nevertheless, a methodical assessment of their effectiveness in various settings is crucial for comprehending their advantages and disadvantages. In this study, we conduct experiments on the CartPole and Lunar Lander environments using both A3C and PPO algorithms. We compare their performance in terms of convergence speed and stability. Our results indicate that A3C typically achieves quicker training times, but exhibits greater instability in reward values. Conversely, PPO demonstrates a more stable training process at the expense of longer execution times. An evaluation of the environment is needed in terms of algorithm selection, based on specific application needs, balancing between training time and stability. A3C is ideal for applications requiring rapid training, while PPO is better suited for those prioritizing training stability.
引用
收藏
页码:146795 / 146806
页数:12
相关论文
共 50 条
  • [11] Comparative Analysis of Reinforcement Learning Algorithms for Bipedal Robot Locomotion
    Aydogmus, Omur
    Yilmaz, Musa
    IEEE ACCESS, 2024, 12 : 7490 - 7499
  • [12] Resource Pricing and Allocation in MEC Enabled Blockchain Systems: An A3C Deep Reinforcement Learning Approach
    Du, Jianbo
    Cheng, Wenjie
    Lu, Guangyue
    Cao, Haotong
    Chu, Xiaoli
    Zhang, Zhicai
    Wang, Junxuan
    IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2022, 9 (01): : 33 - 44
  • [13] A Study on the Effectiveness of A2C and A3C Reinforcement Learning in Parking Space Search in Urban Areas Problem
    Jang, Hung-Chin
    Huang, Yi-Chen
    Chiu, Hsien-An
    11TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE: DATA, NETWORK, AND AI IN THE AGE OF UNTACT (ICTC 2020), 2020, : 567 - 571
  • [14] Universal Reinforcement Learning Algorithms: Survey and Experiments
    Aslanides, John
    Leike, Jan
    Hutter, Marcus
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 1403 - 1410
  • [15] Reinforcement learning algorithms for robotic navigation in dynamic environments
    Yen, GG
    Hickey, TW
    ISA TRANSACTIONS, 2004, 43 (02) : 217 - 230
  • [16] Reinforcement learning algorithms for robotic navigation in dynamic environments
    Yen, G
    Hickey, T
    PROCEEDING OF THE 2002 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-3, 2002, : 1444 - 1449
  • [17] Mask-Attention A3C: Visual Explanation of Action-State Value in Deep Reinforcement Learning
    Itaya, Hidenori
    Hirakawa, Tsubasa
    Yamashita, Takayoshi
    Fujiyoshi, Hironobu
    Sugiura, Komei
    IEEE ACCESS, 2024, 12 : 86553 - 86571
  • [18] Deep Reinforcement Learning in Continuous Action Spaces for Pair Trading: A Comparative Study of A2 C and PPO
    Cristian Quintero
    Diego Leon
    Javier Sandoval
    German Hernandez
    SN Computer Science, 6 (5)
  • [19] An Empirical Study of DDPG and PPO-Based Reinforcement Learning Algorithms for Autonomous Driving
    Siboo, Sanjna
    Bhattacharyya, Anushka
    Naveen Raj, Rashmi
    Ashwin, S. H.
    IEEE ACCESS, 2023, 11 : 125094 - 125108
  • [20] A comparative analysis of reinforcement learning algorithms for earth-observing satellite scheduling
    Herrmann, Adam
    Schaub, Hanspeter
    FRONTIERS IN SPACE TECHNOLOGIES, 2023, 4