Adaptive trajectory-constrained exploration strategy for deep reinforcement learning

被引:0
|
作者
Wang, Guojian [1 ,4 ]
Wu, Faguo [1 ,2 ,3 ,4 ,5 ]
Zhang, Xiao [1 ,3 ,4 ,5 ]
Guo, Ning [1 ,4 ]
Zheng, Zhiming [2 ,3 ,4 ,5 ]
机构
[1] Beihang Univ, Sch Math Sci, Beijing 100191, Peoples R China
[2] Beihang Univ, Inst Artificial Intelligence, Beijing 100191, Peoples R China
[3] Zhongguancun Lab, Beijing 100194, Peoples R China
[4] Beihang Univ, Key Lab Math Informat & Behav Semant LMIB, Beijing 100191, Peoples R China
[5] Beihang Univ, Beijing Adv Innovat Ctr Future Blockchain & Privac, Beijing 100191, Peoples R China
基金
中国国家自然科学基金;
关键词
Deep reinforcement learning; Hard-exploration problem; Policy gradient; Offline suboptimal demonstrations;
D O I
10.1016/j.knosys.2023.111334
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep reinforcement learning (DRL) faces significant challenges in addressing hard -exploration tasks with sparse or deceptive rewards and large state spaces. These challenges severely limit the practical application of DRL. Most previous exploration methods relied on complex architectures to estimate state novelty or introduced sensitive hyperparameters, resulting in instability. To mitigate these issues, we propose an efficient adaptive trajectory -constrained exploration strategy for DRL. The proposed method guides the agent's policy away from suboptimal solutions by regarding previous offline demonstrations as references. Specifically, this approach gradually expands the exploration scope of the agent and strives for optimality in a constrained optimization manner. Additionally, we introduce a novel policy -gradient -based optimization algorithm that utilizes adaptive clipped trajectory -distance rewards for both singleand multi -agent reinforcement learning. We provide a theoretical analysis of our method, including a deduction of the worst -case approximation error bounds, highlighting the validity of our approach for enhancing exploration. To evaluate the effectiveness of the proposed method, we conducted experiments on two large 2D grid world mazes and several MuJoCo tasks. The extensive experimental results demonstrated the significant advantages of our method in achieving temporally extended exploration and avoiding myopic and suboptimal behaviors in both singleand multi -agent settings. Notably, the specific metrics and quantifiable results further support these findings. The code used in the study is available at https://github.com/buaawgj/TACE.
引用
收藏
页数:20
相关论文
共 50 条
  • [41] Digital twin-enabled adaptive scheduling strategy based on deep reinforcement learning
    XueMei Gan
    Ying Zuo
    AnSi Zhang
    ShaoBo Li
    Fei Tao
    Science China Technological Sciences, 2023, 66 : 1937 - 1951
  • [42] Digital twin-enabled adaptive scheduling strategy based on deep reinforcement learning
    Gan, XueMei
    Zuo, Ying
    Zhang, AnSi
    Li, ShaoBo
    Tao, Fei
    SCIENCE CHINA-TECHNOLOGICAL SCIENCES, 2023, 66 (07) : 1937 - 1951
  • [43] Digital twin-enabled adaptive scheduling strategy based on deep reinforcement learning
    GAN XueMei
    ZUO Ying
    ZHANG AnSi
    LI ShaoBo
    TAO Fei
    Science China(Technological Sciences), 2023, 66 (07) : 1937 - 1951
  • [44] Constrained attractor selection using deep reinforcement learning
    Wang, Xue-She
    Turner, James D.
    Mann, Brian P.
    JOURNAL OF VIBRATION AND CONTROL, 2021, 27 (5-6) : 502 - 514
  • [45] Constrained deep reinforcement learning for maritime platform defense
    Markowitz, Jared
    ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR MULTI-DOMAIN OPERATIONS APPLICATIONS VI, 2024, 13051
  • [46] Constrained Deep Reinforcement Learning for Fronthaul Compression Optimization
    Gronland, Axel
    Russo, Alessio
    Jedra, Yassir
    Klaiqi, Bleron
    Gelabert, Xavier
    2024 IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING FOR COMMUNICATION AND NETWORKING, ICMLCN 2024, 2024, : 498 - 504
  • [47] RA-TSC: Learning Adaptive Traffic Signal Control Strategy via Deep Reinforcement Learning
    Du, Yu
    Wei ShangGuan
    Rong, Dingchao
    Chai, Linguo
    2019 IEEE INTELLIGENT TRANSPORTATION SYSTEMS CONFERENCE (ITSC), 2019, : 3275 - 3280
  • [48] #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning
    Tang, Haoran
    Houthooft, Rein
    Foote, Davis
    Stooke, Adam
    Chen, Xi
    Duan, Yan
    Schulman, John
    De Turck, Filip
    Abbeel, Pieter
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [49] Deep reinforcement learning for adaptive mesh refinement
    Foucart, Corbin
    Charous, Aaron
    Lermusiaux, Pierre F. J.
    JOURNAL OF COMPUTATIONAL PHYSICS, 2023, 491
  • [50] Adaptive Slope Locomotion with Deep Reinforcement Learning
    Jones, William
    Blum, Tamir
    Yoshida, Kazuya
    2020 IEEE/SICE INTERNATIONAL SYMPOSIUM ON SYSTEM INTEGRATION (SII), 2020, : 546 - 550