Two-Stage Evolutionary Reinforcement Learning for Enhancing Exploration and Exploitation

被引:0
|
作者
Zhu, Qingling [1 ]
Wu, Xiaoqiang [2 ]
Lin, Qiuzhen [2 ]
Chen, Wei-Neng [3 ]
机构
[1] Shenzhen Univ, Nat Engn Lab Big Data Syst Comp Technol, Shenzhen, Peoples R China
[2] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen, Peoples R China
[3] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou, Peoples R China
基金
国家杰出青年科学基金; 中国国家自然科学基金;
关键词
LEVEL;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The integration of Evolutionary Algorithm (EA) and Reinforcement Learning (RL) has emerged as a promising approach for tackling some challenges in RL, such as sparse rewards, lack of exploration, and brittle convergence properties. However, existing methods often employ actor networks as individuals of EA, which may constrain their exploratory capabilities, as the entire actor population will stop evolving when the critic network in RL falls into local optima. To alleviate this issue, this paper introduces a Two-stage Evolutionary Reinforcement Learning (TERL) framework that maintains a population containing both actor and critic networks. TERL divides the learning process into two stages. In the initial stage, individuals independently learn actor-critic networks, which are optimized alternatively by RL and Particle Swarm Optimization (PSO). This dual optimization fosters greater exploration, curbing susceptibility to local optima. Shared information from a common replay buffer and PSO algorithm substantially mitigates the computational load of training multiple agents. In the subsequent stage, TERL shifts to a refined exploitation phase. Here, only the best individual undergoes further refinement, while the remaining individuals continue PSO-based optimization. This allocates more computational resources to the best individual for yielding superior performance. Empirical assessments, conducted across a range of continuous control problems, validate the efficacy of the proposed TERL paradigm.
引用
收藏
页码:20892 / 20900
页数:9
相关论文
共 50 条
  • [41] A Two-Stage Target Search and Tracking Method for UAV Based on Deep Reinforcement Learning
    Liu, Mei
    Wei, Jingbo
    Liu, Kun
    DRONES, 2024, 8 (10)
  • [42] Two-Stage Reinforcement Learning-Based Differential Evolution for Solving Nonlinear Equations
    Liao, Zuowen
    Gong, Wenyin
    Li, Shuijia
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2023, 53 (07): : 4279 - 4290
  • [43] Two-Stage Reinforcement Learning Policy Search for Grid-Interactive Building Control
    Zhang, Xiangyu
    Chen, Yue
    Bernstein, Andrey
    Chintala, Rohit
    Graf, Peter
    Jin, Xin
    Biagioni, David
    IEEE TRANSACTIONS ON SMART GRID, 2022, 13 (03) : 1976 - 1987
  • [44] Two-stage reinforcement learning on credit branch genetic network programming for mobile robots
    Sendari, Siti
    Mabu, Shingo
    Hirasawa, Kotaro
    IEEJ Transactions on Electronics, Information and Systems, 2013, 133 (04) : 856 - 863
  • [45] A Two-Stage Cooperative Reinforcement Learning Scheme for Energy-Aware Computational Offloading
    Avgeris, Marios
    Mechennef, Meriem
    Leivadeas, Aris
    Lambadaris, Ioannis
    2023 IEEE 24TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE SWITCHING AND ROUTING, HPSR, 2023,
  • [46] From mimic to counteract: a two-stage reinforcement learning algorithm for Google research football
    Zhao, Junjie
    Lin, Jiangwen
    Zhang, Xinyan
    Li, Yuanbai
    Zhou, Xianzhong
    Sun, Yuxiang
    NEURAL COMPUTING & APPLICATIONS, 2024, 36 (13): : 7203 - 7219
  • [47] A novel two-stage reinforcement learning framework for sustainable building energy management systems
    Li, Donghe
    Zhao, Yijie
    Xi, Huan
    JOURNAL OF BUILDING ENGINEERING, 2024, 98
  • [48] From mimic to counteract: a two-stage reinforcement learning algorithm for Google research football
    Junjie Zhao
    Jiangwen Lin
    Xinyan Zhang
    Yuanbai Li
    Xianzhong Zhou
    Yuxiang Sun
    Neural Computing and Applications, 2024, 36 : 7203 - 7219
  • [49] Control of exploitation-exploration meta-parameter in reinforcement learning
    Ishii, S
    Yoshida, W
    Yoshimoto, J
    NEURAL NETWORKS, 2002, 15 (4-6) : 665 - 687
  • [50] Decoupling Exploration and Exploitation for Meta-Reinforcement Learning without Sacrifices
    Liu, Evan Zheran
    Raghunathan, Aditi
    Liang, Percy
    Finn, Chelsea
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139