Adaptive trajectory-constrained exploration strategy for deep reinforcement learning

被引:0
|
作者
Wang, Guojian [1 ,4 ]
Wu, Faguo [1 ,2 ,3 ,4 ,5 ]
Zhang, Xiao [1 ,3 ,4 ,5 ]
Guo, Ning [1 ,4 ]
Zheng, Zhiming [2 ,3 ,4 ,5 ]
机构
[1] Beihang Univ, Sch Math Sci, Beijing 100191, Peoples R China
[2] Beihang Univ, Inst Artificial Intelligence, Beijing 100191, Peoples R China
[3] Zhongguancun Lab, Beijing 100194, Peoples R China
[4] Beihang Univ, Key Lab Math Informat & Behav Semant LMIB, Beijing 100191, Peoples R China
[5] Beihang Univ, Beijing Adv Innovat Ctr Future Blockchain & Privac, Beijing 100191, Peoples R China
基金
中国国家自然科学基金;
关键词
Deep reinforcement learning; Hard-exploration problem; Policy gradient; Offline suboptimal demonstrations;
D O I
10.1016/j.knosys.2023.111334
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep reinforcement learning (DRL) faces significant challenges in addressing hard -exploration tasks with sparse or deceptive rewards and large state spaces. These challenges severely limit the practical application of DRL. Most previous exploration methods relied on complex architectures to estimate state novelty or introduced sensitive hyperparameters, resulting in instability. To mitigate these issues, we propose an efficient adaptive trajectory -constrained exploration strategy for DRL. The proposed method guides the agent's policy away from suboptimal solutions by regarding previous offline demonstrations as references. Specifically, this approach gradually expands the exploration scope of the agent and strives for optimality in a constrained optimization manner. Additionally, we introduce a novel policy -gradient -based optimization algorithm that utilizes adaptive clipped trajectory -distance rewards for both singleand multi -agent reinforcement learning. We provide a theoretical analysis of our method, including a deduction of the worst -case approximation error bounds, highlighting the validity of our approach for enhancing exploration. To evaluate the effectiveness of the proposed method, we conducted experiments on two large 2D grid world mazes and several MuJoCo tasks. The extensive experimental results demonstrated the significant advantages of our method in achieving temporally extended exploration and avoiding myopic and suboptimal behaviors in both singleand multi -agent settings. Notably, the specific metrics and quantifiable results further support these findings. The code used in the study is available at https://github.com/buaawgj/TACE.
引用
收藏
页数:20
相关论文
共 50 条
  • [21] Adaptive Bitrate Algorithms via Deep Reinforcement Learning With Digital Twins Assisted Trajectory
    Ye, Jin
    Qin, Shaowen
    Xiao, Qingyu
    Jiang, Wenchao
    Tang, Xin
    Li, Xiaohuan
    IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2024, 11 (04): : 3522 - 3535
  • [22] Deep Reinforcement Learning for Adaptive Learning Systems
    Li, Xiao
    Xu, Hanchen
    Zhang, Jinming
    Chang, Hua-hua
    JOURNAL OF EDUCATIONAL AND BEHAVIORAL STATISTICS, 2023, 48 (02) : 220 - 243
  • [23] Adaptive Exploration Strategy With Multi-Attribute Decision-Making for Reinforcement Learning
    Hu, Chunyang
    Xu, Meng
    IEEE ACCESS, 2020, 8 : 32353 - 32364
  • [24] An adaptive testing item selection strategy via a deep reinforcement learning approach
    Wang, Pujue
    Liu, Hongyun
    Xu, Mingqi
    BEHAVIOR RESEARCH METHODS, 2024, 56 (08) : 8695 - 8714
  • [25] A stochastic exploration strategy for satisficing reinforcement learning
    Katayama, S
    Kobayashi, S
    INTELLIGENT AUTONOMOUS SYSTEMS: IAS-5, 1998, : 296 - 303
  • [26] Deep Adaptive Control: Deep Reinforcement Learning-Based Adaptive Vehicle Trajectory Control Algorithms for Different Risk Levels
    He, Yixu
    Liu, Yang
    Yang, Lan
    Qu, Xiaobo
    IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2024, 9 (01): : 1654 - 1666
  • [27] An improved frontier-based robot exploration strategy combined with deep reinforcement learning
    Wang, Rui
    Zhang, Jie
    Lyu, Ming
    Yan, Cheng
    Chen, Yaowei
    ROBOTICS AND AUTONOMOUS SYSTEMS, 2024, 181
  • [28] Deep Reinforcement Learning for Trajectory Path Planning and Distributed Inference in Resource-Constrained UAV Swarms
    Dhuheir, Marwan
    Baccour, Emna
    Erbad, Aiman
    Al-Obaidi, Sinan Sabeeh
    Hamdi, Mounir
    IEEE INTERNET OF THINGS JOURNAL, 2023, 10 (09) : 8185 - 8201
  • [29] Trajectory Optimization for Unknown Constrained Systems using Reinforcement Learning
    Ota, Kei
    Jha, Devesh K.
    Oiki, Tomoaki
    Miura, Mamoru
    Nammoto, Takashi
    Nikovski, Daniel
    Mariyama, Toshisada
    2019 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2019, : 3487 - 3494
  • [30] Autonomous exploration through deep reinforcement learning
    Yan, Xiangda
    Huang, Jie
    He, Keyan
    Hong, Huajie
    Xu, Dasheng
    INDUSTRIAL ROBOT-THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH AND APPLICATION, 2023, 50 (05): : 793 - 803