Two-Stage Evolutionary Reinforcement Learning for Enhancing Exploration and Exploitation

被引:0
|
作者
Zhu, Qingling [1 ]
Wu, Xiaoqiang [2 ]
Lin, Qiuzhen [2 ]
Chen, Wei-Neng [3 ]
机构
[1] Shenzhen Univ, Nat Engn Lab Big Data Syst Comp Technol, Shenzhen, Peoples R China
[2] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen, Peoples R China
[3] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou, Peoples R China
基金
国家杰出青年科学基金; 中国国家自然科学基金;
关键词
LEVEL;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The integration of Evolutionary Algorithm (EA) and Reinforcement Learning (RL) has emerged as a promising approach for tackling some challenges in RL, such as sparse rewards, lack of exploration, and brittle convergence properties. However, existing methods often employ actor networks as individuals of EA, which may constrain their exploratory capabilities, as the entire actor population will stop evolving when the critic network in RL falls into local optima. To alleviate this issue, this paper introduces a Two-stage Evolutionary Reinforcement Learning (TERL) framework that maintains a population containing both actor and critic networks. TERL divides the learning process into two stages. In the initial stage, individuals independently learn actor-critic networks, which are optimized alternatively by RL and Particle Swarm Optimization (PSO). This dual optimization fosters greater exploration, curbing susceptibility to local optima. Shared information from a common replay buffer and PSO algorithm substantially mitigates the computational load of training multiple agents. In the subsequent stage, TERL shifts to a refined exploitation phase. Here, only the best individual undergoes further refinement, while the remaining individuals continue PSO-based optimization. This allocates more computational resources to the best individual for yielding superior performance. Empirical assessments, conducted across a range of continuous control problems, validate the efficacy of the proposed TERL paradigm.
引用
收藏
页码:20892 / 20900
页数:9
相关论文
共 50 条
  • [1] Two-stage reinforcement-learning-based cognitive radio with exploration control
    Jiang, T.
    Grace, D.
    Liu, Y.
    IET COMMUNICATIONS, 2011, 5 (05) : 644 - 651
  • [2] A Two-Stage Multi-Objective Evolutionary Reinforcement Learning Framework for Continuous Robot Control
    Hai Long Tran
    Long Doan
    Ngoc Hoang Luong
    Huynh Thi Thanh Binh
    PROCEEDINGS OF THE 2023 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, GECCO 2023, 2023, : 577 - 585
  • [3] A reinforcement learning driven two-stage evolutionary optimisation for hybrid seru system scheduling with worker transfer
    Wu, Yuting
    Wang, Ling
    Chen, Jing-fang
    Zheng, Jie
    Pan, Zixiao
    INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH, 2024, 62 (11) : 3952 - 3971
  • [4] Deterministic Sequencing of Exploration and Exploitation for Reinforcement Learning
    Gupta, Piyush
    Srivastava, Vaibhav
    2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 2313 - 2318
  • [5] Balancing exploration and exploitation in episodic reinforcement learning
    Chen, Qihang
    Zhang, Qiwei
    Liu, Yunlong
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 231
  • [6] Balancing Exploration and Exploitation Ratio in Reinforcement Learning
    Ozcan, Ozkan
    de Moraes, Claudio Coreixas
    Alt, Jonathan
    MILITARY MODELING & SIMULATION SYMPOSIUM 2011 (MMS 2011) - 2011 SPRING SIMULATION MULTICONFERENCE - BK 7 OF 8, 2011, : 126 - 131
  • [7] Two-stage reinforcement learning task predicts psychological traits
    Trevino, Mario
    Castiello, Santiago
    De la Torre-Valdovinos, Braniff
    Carrasco, Paulina Osuna
    Leon, Ricardo Medina-Coss
    Arias-Carrion, Oscar
    PSYCH JOURNAL, 2023, 12 (03) : 355 - 367
  • [8] Two-Stage Evolutionary Neural Architecture Search for Transfer Learning
    Wen, Yu-Wei
    Peng, Sheng-Hsuan
    Ting, Chuan-Kang
    IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2021, 25 (05) : 928 - 940
  • [9] Balance of exploration and exploitation: Non-cooperative game-driven evolutionary reinforcement learning
    Yu, Jin
    Zhang, Ya
    Sun, Changyin
    SWARM AND EVOLUTIONARY COMPUTATION, 2024, 91
  • [10] Exploration and exploitation balance management in fuzzy reinforcement learning
    Derhami, Vali
    Majd, Vahid Johari
    Ahmadabadi, Majid Nili
    FUZZY SETS AND SYSTEMS, 2010, 161 (04) : 578 - 595