Comparative Analysis of A3C and PPO Algorithms in Reinforcement Learning: A Survey on General Environments

被引：3

作者：

del Rio, Alberto ^{[1
]}

Jimenez, David ^{[2
]}

Serrano, Javier ^{[3
]}

机构：

[1] Univ Politecn Madrid, Escuela Tecn Super Ingn Telecomunicac ETSIT, Signals Syst & Radiocommun Dept, Madrid 28040, Spain

[2] Univ Politecn Madrid, Escuela Tecn Super Ingn Telecomunicac ETSIT, Phys Elect Elect Engn & Appl Phys Dept, Madrid 28040, Spain

[3] Univ Politecn Madrid, Escuela Tecn Super Ingn Sistemas Informat ETSISI, Informat Syst Dept, Madrid 28031, Spain

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Training; Stability analysis; Heuristic algorithms; Surveys; Moon; Space vehicles; Convergence; Prediction algorithms; Software algorithms; Reliability; Reinforcement learning; A3C; CartPole; comparison; environment complexity; Lunar Lander; performance analysis; PPO; reinforcement learning; sample efficiency; stability; STABILITY;

D O I：

10.1109/ACCESS.2024.3472473

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This research article presents a comparison between two mainstream Deep Reinforcement Learning (DRL) algorithms, Asynchronous Advantage Actor-Critic (A3C) and Proximal Policy Optimization (PPO), in the context of two diverse environments: CartPole and Lunar Lander. DRL algorithms are widely known for their effectiveness in training agents to navigate complex environments and achieve optimal policies. Nevertheless, a methodical assessment of their effectiveness in various settings is crucial for comprehending their advantages and disadvantages. In this study, we conduct experiments on the CartPole and Lunar Lander environments using both A3C and PPO algorithms. We compare their performance in terms of convergence speed and stability. Our results indicate that A3C typically achieves quicker training times, but exhibits greater instability in reward values. Conversely, PPO demonstrates a more stable training process at the expense of longer execution times. An evaluation of the environment is needed in terms of algorithm selection, based on specific application needs, balancing between training time and stability. A3C is ideal for applications requiring rapid training, while PPO is better suited for those prioritizing training stability.

引用

页码：146795 / 146806

页数：12

共 50 条

[11] Comparative Analysis of Reinforcement Learning Algorithms for Bipedal Robot Locomotion
Aydogmus, Omur
Yilmaz, Musa
IEEE ACCESS, 2024, 12 : 7490 - 7499
[12] Resource Pricing and Allocation in MEC Enabled Blockchain Systems: An A3C Deep Reinforcement Learning Approach
Du, Jianbo
Cheng, Wenjie
Lu, Guangyue
Cao, Haotong
Chu, Xiaoli
Zhang, Zhicai
Wang, Junxuan
IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2022, 9 (01): : 33 - 44
[13] A Study on the Effectiveness of A2C and A3C Reinforcement Learning in Parking Space Search in Urban Areas Problem
Jang, Hung-Chin
Huang, Yi-Chen
Chiu, Hsien-An
11TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE: DATA, NETWORK, AND AI IN THE AGE OF UNTACT (ICTC 2020), 2020, : 567 - 571
[14] Universal Reinforcement Learning Algorithms: Survey and Experiments
Aslanides, John
Leike, Jan
Hutter, Marcus
PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 1403 - 1410
[15] Reinforcement learning algorithms for robotic navigation in dynamic environments
Yen, GG
Hickey, TW
ISA TRANSACTIONS, 2004, 43 (02) : 217 - 230
[16] Reinforcement learning algorithms for robotic navigation in dynamic environments
Yen, G
Hickey, T
PROCEEDING OF THE 2002 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-3, 2002, : 1444 - 1449
[17] Mask-Attention A3C: Visual Explanation of Action-State Value in Deep Reinforcement Learning
Itaya, Hidenori
Hirakawa, Tsubasa
Yamashita, Takayoshi
Fujiyoshi, Hironobu
Sugiura, Komei
IEEE ACCESS, 2024, 12 : 86553 - 86571
[18] Deep Reinforcement Learning in Continuous Action Spaces for Pair Trading: A Comparative Study of A2 C and PPO
Cristian Quintero
Diego Leon
Javier Sandoval
German Hernandez
SN Computer Science, 6 (5)
[19] An Empirical Study of DDPG and PPO-Based Reinforcement Learning Algorithms for Autonomous Driving
Siboo, Sanjna
Bhattacharyya, Anushka
Naveen Raj, Rashmi
Ashwin, S. H.
IEEE ACCESS, 2023, 11 : 125094 - 125108
[20] A comparative analysis of reinforcement learning algorithms for earth-observing satellite scheduling
Herrmann, Adam
Schaub, Hanspeter
FRONTIERS IN SPACE TECHNOLOGIES, 2023, 4

← 1 2 3 4 5 →