Adaptive trajectory-constrained exploration strategy for deep reinforcement learning

被引:0
|
作者
Wang, Guojian [1 ,4 ]
Wu, Faguo [1 ,2 ,3 ,4 ,5 ]
Zhang, Xiao [1 ,3 ,4 ,5 ]
Guo, Ning [1 ,4 ]
Zheng, Zhiming [2 ,3 ,4 ,5 ]
机构
[1] Beihang Univ, Sch Math Sci, Beijing 100191, Peoples R China
[2] Beihang Univ, Inst Artificial Intelligence, Beijing 100191, Peoples R China
[3] Zhongguancun Lab, Beijing 100194, Peoples R China
[4] Beihang Univ, Key Lab Math Informat & Behav Semant LMIB, Beijing 100191, Peoples R China
[5] Beihang Univ, Beijing Adv Innovat Ctr Future Blockchain & Privac, Beijing 100191, Peoples R China
基金
中国国家自然科学基金;
关键词
Deep reinforcement learning; Hard-exploration problem; Policy gradient; Offline suboptimal demonstrations;
D O I
10.1016/j.knosys.2023.111334
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep reinforcement learning (DRL) faces significant challenges in addressing hard -exploration tasks with sparse or deceptive rewards and large state spaces. These challenges severely limit the practical application of DRL. Most previous exploration methods relied on complex architectures to estimate state novelty or introduced sensitive hyperparameters, resulting in instability. To mitigate these issues, we propose an efficient adaptive trajectory -constrained exploration strategy for DRL. The proposed method guides the agent's policy away from suboptimal solutions by regarding previous offline demonstrations as references. Specifically, this approach gradually expands the exploration scope of the agent and strives for optimality in a constrained optimization manner. Additionally, we introduce a novel policy -gradient -based optimization algorithm that utilizes adaptive clipped trajectory -distance rewards for both singleand multi -agent reinforcement learning. We provide a theoretical analysis of our method, including a deduction of the worst -case approximation error bounds, highlighting the validity of our approach for enhancing exploration. To evaluate the effectiveness of the proposed method, we conducted experiments on two large 2D grid world mazes and several MuJoCo tasks. The extensive experimental results demonstrated the significant advantages of our method in achieving temporally extended exploration and avoiding myopic and suboptimal behaviors in both singleand multi -agent settings. Notably, the specific metrics and quantifiable results further support these findings. The code used in the study is available at https://github.com/buaawgj/TACE.
引用
收藏
页数:20
相关论文
共 50 条
  • [31] Real-time adaptive entry trajectory generation with modular policy and deep reinforcement learning
    Peng, Gaoxiang
    Wang, Bo
    Liu, Lei
    Fan, Huijin
    Cheng, Zhongtao
    AEROSPACE SCIENCE AND TECHNOLOGY, 2023, 142
  • [32] Adaptive Nonlinear Model Predictive Horizon Using Deep Reinforcement Learning for Optimal Trajectory Planning
    Al Younes, Younes
    Barczyk, Martin
    DRONES, 2022, 6 (11)
  • [33] Reentry trajectory optimization based on Deep Reinforcement Learning
    Gao, Jiashi
    Shi, Xinming
    Cheng, Zhongtao
    Xiong, Jizhang
    Liu, Lei
    Wang, Yongji
    Yang, Ye
    PROCEEDINGS OF THE 2019 31ST CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2019), 2019, : 2588 - 2592
  • [34] Sub-trajectory clustering with deep reinforcement learning
    Liang, Anqi
    Yao, Bin
    Wang, Bo
    Liu, Yinpei
    Chen, Zhida
    Xie, Jiong
    Li, Feifei
    VLDB JOURNAL, 2024, 33 (03): : 685 - 702
  • [35] Sub-trajectory clustering with deep reinforcement learning
    Anqi Liang
    Bin Yao
    Bo Wang
    Yinpei Liu
    Zhida Chen
    Jiong Xie
    Feifei Li
    The VLDB Journal, 2024, 33 : 685 - 702
  • [36] Deep Reinforcement Learning for Trajectory Generation and Optimisation of UAVs
    Akhtar, Mishma
    Maqsood, Adnan
    Verbeke, Mathias
    2023 10TH INTERNATIONAL CONFERENCE ON RECENT ADVANCES IN AIR AND SPACE TECHNOLOGIES, RAST, 2023,
  • [37] Robust trajectory-constrained frequency control for microgrids considering model linearization error
    Zhang, Yichen
    Chen, Chen
    Hong, Tianqi
    Cui, Bai
    Xu, Zhe
    Chen, Bo
    Qiu, Feng
    APPLIED ENERGY, 2023, 333
  • [38] When to Replan? An Adaptive Replanning Strategy for Autonomous Navigation using Deep Reinforcement Learning
    Honda, Kohei
    Yonetani, Ryo
    Nishimura, Mai
    Kozuno, Tadashi
    2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2024, 2024, : 6650 - 6656
  • [39] Novel adaptive stability enhancement strategy for power systems based on deep reinforcement learning
    Zhao, Yincheng
    Hu, Weihao
    Zhang, Guozhou
    Huang, Qi
    Chen, Zhe
    Blaabjerg, Frede
    INTERNATIONAL JOURNAL OF ELECTRICAL POWER & ENERGY SYSTEMS, 2023, 152
  • [40] Digital twin-enabled adaptive scheduling strategy based on deep reinforcement learning
    GAN XueMei
    ZUO Ying
    ZHANG AnSi
    LI ShaoBo
    TAO Fei
    Science China(Technological Sciences), 2023, (07) : 1937 - 1951