Two-Stage Evolutionary Reinforcement Learning for Enhancing Exploration and Exploitation

被引:0
|
作者
Zhu, Qingling [1 ]
Wu, Xiaoqiang [2 ]
Lin, Qiuzhen [2 ]
Chen, Wei-Neng [3 ]
机构
[1] Shenzhen Univ, Nat Engn Lab Big Data Syst Comp Technol, Shenzhen, Peoples R China
[2] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen, Peoples R China
[3] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou, Peoples R China
基金
国家杰出青年科学基金; 中国国家自然科学基金;
关键词
LEVEL;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The integration of Evolutionary Algorithm (EA) and Reinforcement Learning (RL) has emerged as a promising approach for tackling some challenges in RL, such as sparse rewards, lack of exploration, and brittle convergence properties. However, existing methods often employ actor networks as individuals of EA, which may constrain their exploratory capabilities, as the entire actor population will stop evolving when the critic network in RL falls into local optima. To alleviate this issue, this paper introduces a Two-stage Evolutionary Reinforcement Learning (TERL) framework that maintains a population containing both actor and critic networks. TERL divides the learning process into two stages. In the initial stage, individuals independently learn actor-critic networks, which are optimized alternatively by RL and Particle Swarm Optimization (PSO). This dual optimization fosters greater exploration, curbing susceptibility to local optima. Shared information from a common replay buffer and PSO algorithm substantially mitigates the computational load of training multiple agents. In the subsequent stage, TERL shifts to a refined exploitation phase. Here, only the best individual undergoes further refinement, while the remaining individuals continue PSO-based optimization. This allocates more computational resources to the best individual for yielding superior performance. Empirical assessments, conducted across a range of continuous control problems, validate the efficacy of the proposed TERL paradigm.
引用
收藏
页码:20892 / 20900
页数:9
相关论文
共 50 条
  • [21] A Reinforcement Learning Based Two-Stage Model for Emotion Cause Pair Extraction
    Chen, Xinhong
    Li, Qing
    Li, Zongxi
    Xie, Haoran
    Wang, Fu Lee
    Wang, Jianping
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (03) : 1779 - 1790
  • [22] Two-Stage Unsupervised Hyperspectral Band Selection Based on Deep Reinforcement Learning
    Guo, Yi
    Wang, Qianqian
    Hu, Bingliang
    Qian, Xueming
    Ye, Haibo
    REMOTE SENSING, 2025, 17 (04)
  • [23] Incorporating Explanations to Balance the Exploration and Exploitation of Deep Reinforcement Learning
    Wang, Xinzhi
    Liu, Yang
    Chang, Yudong
    Jiang, Chao
    Zhang, Qingjie
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT II, 2022, 13369 : 200 - 211
  • [24] Two-Stage Safe Reinforcement Learning for High-Speed Autonomous Racing
    Niu, Jingyu
    Hu, Yu
    Jin, Beibei
    Han, Yinhe
    Li, Xiaowei
    2020 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2020, : 3934 - 3941
  • [25] Spectrum Access In Cognitive Radio Using a Two-Stage Reinforcement Learning Approach
    Raj, Vishnu
    Dias, Irene
    Tholeti, Thulasi
    Kalyani, Sheetal
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2018, 12 (01) : 20 - 34
  • [26] Two-Stage Reinforcement Learning Based on Genetic Network Programming for Mobile Robot
    Sendari, Siti
    Mabu, Shingo
    Hirasawa, Kotaro
    2012 PROCEEDINGS OF SICE ANNUAL CONFERENCE (SICE), 2012, : 95 - 100
  • [27] Two-stage selection of distributed data centers based on deep reinforcement learning
    Li, Qirui
    Peng, Zhiping
    Cui, Delong
    Lin, Jianpeng
    He, Jieguang
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2022, 25 (04): : 2699 - 2714
  • [28] A Two-Stage Relational Reinforcement Learning with Continuous Actions for Real Service Robots
    Zaragoza, Julio H.
    Morales, Eduardo F.
    MICAI 2009: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2009, 5845 : 337 - 348
  • [29] Two-stage selection of distributed data centers based on deep reinforcement learning
    Li, Qirui
    Peng, Zhiping
    Cui, Delong
    Lin, Jianpeng
    He, Jieguang
    Cluster Computing, 2022, 25 (04) : 2699 - 2714
  • [30] A two-stage multiobjective evolutionary ensemble learning for silicon prediction in blast furnace
    Qiang Li
    Jingchuan Zhang
    Wenhao Wang
    Xianpeng Wang
    Complex & Intelligent Systems, 2024, 10 : 1639 - 1660