Two-Stage Evolutionary Reinforcement Learning for Enhancing Exploration and Exploitation

被引：0

作者：

Zhu, Qingling ^{[1
]}

Wu, Xiaoqiang ^{[2
]}

Lin, Qiuzhen ^{[2
]}

Chen, Wei-Neng ^{[3
]}

机构：

[1] Shenzhen Univ, Nat Engn Lab Big Data Syst Comp Technol, Shenzhen, Peoples R China

[2] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen, Peoples R China

[3] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou, Peoples R China

来源：

THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 18 | 2024年

基金：

国家杰出青年科学基金; 中国国家自然科学基金;

关键词：

LEVEL;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The integration of Evolutionary Algorithm (EA) and Reinforcement Learning (RL) has emerged as a promising approach for tackling some challenges in RL, such as sparse rewards, lack of exploration, and brittle convergence properties. However, existing methods often employ actor networks as individuals of EA, which may constrain their exploratory capabilities, as the entire actor population will stop evolving when the critic network in RL falls into local optima. To alleviate this issue, this paper introduces a Two-stage Evolutionary Reinforcement Learning (TERL) framework that maintains a population containing both actor and critic networks. TERL divides the learning process into two stages. In the initial stage, individuals independently learn actor-critic networks, which are optimized alternatively by RL and Particle Swarm Optimization (PSO). This dual optimization fosters greater exploration, curbing susceptibility to local optima. Shared information from a common replay buffer and PSO algorithm substantially mitigates the computational load of training multiple agents. In the subsequent stage, TERL shifts to a refined exploitation phase. Here, only the best individual undergoes further refinement, while the remaining individuals continue PSO-based optimization. This allocates more computational resources to the best individual for yielding superior performance. Empirical assessments, conducted across a range of continuous control problems, validate the efficacy of the proposed TERL paradigm.

引用

页码：20892 / 20900

页数：9

共 50 条

[31] A two-stage multiobjective evolutionary ensemble learning for silicon prediction in blast furnace
Li, Qiang
Zhang, Jingchuan
Wang, Wenhao
Wang, Xianpeng
COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (01) : 1639 - 1660
[32] Two-Stage Metric Learning
Wang, Jun
Sun, Ke
Sha, Fei
Marchand-Maillet, Stephane
Kalousis, Alexandros
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 2), 2014, 32 : 370 - 378
[33] Enhancing Retrieval-Augmented LMs with a Two-Stage Consistency Learning Compressor
Xu, Chuankai
Zhao, Dongming
Wang, Bo
Xing, Hanwen
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT III, ICIC 2024, 2024, 14877 : 511 - 522
[34] Enhancing Portfolio Optimization: A Two-Stage Approach with Deep Learning and Portfolio Optimization
Huang, Shiguo
Cao, Linyu
Sun, Ruili
Ma, Tiefeng
Liu, Shuangzhe
MATHEMATICS, 2024, 12 (21)
[35] A two-stage autonomous evolutionary music composer
Khalifa, Y
Foster, R
APPLICATIONS OF EVOLUTIONARY COMPUTING, PROCEEDINGS, 2006, 3907 : 717 - 721
[36] Exploration on the two-stage model environmental governance
Li, S., 2012, Maxwell Science Publications, 74, Kenelm Road,, B10, 9AJ, Birmingham, Small Heath, United Kingdom (04)
[37] AK-TSAGL: A two-stage hybrid algorithm combining global exploration and local exploitation based on active learning for structural reliability analysis
Li, Bingyi
Jia, Xiang
Long, Jiahui
RELIABILITY ENGINEERING & SYSTEM SAFETY, 2024, 250
[38] Two-stage deep reinforcement learning method for agile optical satellite scheduling problem
Liu, Zheng
Xiong, Wei
Jia, Zhuoya
Han, Chi
COMPLEX & INTELLIGENT SYSTEMS, 2025, 11 (01)
[39] Two-Stage Hybrid Network Clustering Using Multi-Agent Reinforcement Learning
Kim, Joohyun
Ryu, Dongkwan
Kim, Juyeon
Kim, Jae-Hoon
ELECTRONICS, 2021, 10 (03) : 1 - 16
[40] Deep Curriculum Reinforcement Learning for Adaptive 360 Video Streaming With Two-Stage Training
Xie, Yuhong
Zhang, Yuan
Lin, Tao
IEEE TRANSACTIONS ON BROADCASTING, 2024, 70 (02) : 441 - 452

← 1 2 3 4 5 →