Two-Stage Evolutionary Reinforcement Learning for Enhancing Exploration and Exploitation

被引：0

作者：

Zhu, Qingling ^{[1
]}

Wu, Xiaoqiang ^{[2
]}

Lin, Qiuzhen ^{[2
]}

Chen, Wei-Neng ^{[3
]}

机构：

[1] Shenzhen Univ, Nat Engn Lab Big Data Syst Comp Technol, Shenzhen, Peoples R China

[2] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen, Peoples R China

[3] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou, Peoples R China

来源：

THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 18 | 2024年

基金：

国家杰出青年科学基金; 中国国家自然科学基金;

关键词：

LEVEL;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The integration of Evolutionary Algorithm (EA) and Reinforcement Learning (RL) has emerged as a promising approach for tackling some challenges in RL, such as sparse rewards, lack of exploration, and brittle convergence properties. However, existing methods often employ actor networks as individuals of EA, which may constrain their exploratory capabilities, as the entire actor population will stop evolving when the critic network in RL falls into local optima. To alleviate this issue, this paper introduces a Two-stage Evolutionary Reinforcement Learning (TERL) framework that maintains a population containing both actor and critic networks. TERL divides the learning process into two stages. In the initial stage, individuals independently learn actor-critic networks, which are optimized alternatively by RL and Particle Swarm Optimization (PSO). This dual optimization fosters greater exploration, curbing susceptibility to local optima. Shared information from a common replay buffer and PSO algorithm substantially mitigates the computational load of training multiple agents. In the subsequent stage, TERL shifts to a refined exploitation phase. Here, only the best individual undergoes further refinement, while the remaining individuals continue PSO-based optimization. This allocates more computational resources to the best individual for yielding superior performance. Empirical assessments, conducted across a range of continuous control problems, validate the efficacy of the proposed TERL paradigm.

引用

页码：20892 / 20900

页数：9

共 50 条

[41] A Two-Stage Target Search and Tracking Method for UAV Based on Deep Reinforcement Learning
Liu, Mei
Wei, Jingbo
Liu, Kun
DRONES, 2024, 8 (10)
[42] Two-Stage Reinforcement Learning-Based Differential Evolution for Solving Nonlinear Equations
Liao, Zuowen
Gong, Wenyin
Li, Shuijia
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2023, 53 (07): : 4279 - 4290
[43] Two-Stage Reinforcement Learning Policy Search for Grid-Interactive Building Control
Zhang, Xiangyu
Chen, Yue
Bernstein, Andrey
Chintala, Rohit
Graf, Peter
Jin, Xin
Biagioni, David
IEEE TRANSACTIONS ON SMART GRID, 2022, 13 (03) : 1976 - 1987
[44] Two-stage reinforcement learning on credit branch genetic network programming for mobile robots
Sendari, Siti
Mabu, Shingo
Hirasawa, Kotaro
IEEJ Transactions on Electronics, Information and Systems, 2013, 133 (04) : 856 - 863
[45] A Two-Stage Cooperative Reinforcement Learning Scheme for Energy-Aware Computational Offloading
Avgeris, Marios
Mechennef, Meriem
Leivadeas, Aris
Lambadaris, Ioannis
2023 IEEE 24TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE SWITCHING AND ROUTING, HPSR, 2023,
[46] From mimic to counteract: a two-stage reinforcement learning algorithm for Google research football
Zhao, Junjie
Lin, Jiangwen
Zhang, Xinyan
Li, Yuanbai
Zhou, Xianzhong
Sun, Yuxiang
NEURAL COMPUTING & APPLICATIONS, 2024, 36 (13): : 7203 - 7219
[47] A novel two-stage reinforcement learning framework for sustainable building energy management systems
Li, Donghe
Zhao, Yijie
Xi, Huan
JOURNAL OF BUILDING ENGINEERING, 2024, 98
[48] From mimic to counteract: a two-stage reinforcement learning algorithm for Google research football
Junjie Zhao
Jiangwen Lin
Xinyan Zhang
Yuanbai Li
Xianzhong Zhou
Yuxiang Sun
Neural Computing and Applications, 2024, 36 : 7203 - 7219
[49] Control of exploitation-exploration meta-parameter in reinforcement learning
Ishii, S
Yoshida, W
Yoshimoto, J
NEURAL NETWORKS, 2002, 15 (4-6) : 665 - 687
[50] Decoupling Exploration and Exploitation for Meta-Reinforcement Learning without Sacrifices
Liu, Evan Zheran
Raghunathan, Aditi
Liang, Percy
Finn, Chelsea
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139

← 1 2 3 4 5 →