Probabilistic Counterexample Guidance for Safer Reinforcement Learning

被引:1
|
作者
Ji, Xiaotong [1 ]
Filieri, Antonio [1 ]
机构
[1] Imperial Coll London, Dept Comp, London SW7 2AZ, England
关键词
Safe reinforcement learning; Probabilistic model checking; Counterexample guidance;
D O I
10.1007/978-3-031-43835-6_22
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Safe exploration aims at addressing the limitations of Reinforcement Learning (RL) in safety-critical scenarios, where failures during trial-and-error learning may incur high costs. Several methods exist to incorporate external knowledge or to use proximal sensor data to limit the exploration of unsafe states. However, reducing exploration risks in unknown environments, where an agent must discover safety threats during exploration, remains challenging. In this paper, we target the problem of safe exploration by guiding the training with counterexamples of the safety requirement. Our method abstracts both continuous and discrete state-space systems into compact abstract models representing the safety-relevant knowledge acquired by the agent during exploration. We then exploit probabilistic counter-example generation to construct minimal simulation submodels eliciting safety requirement violations, where the agent can efficiently train offline to refine its policy towards minimising the risk of safety violations during the subsequent online exploration. We demonstrate our method's effectiveness in reducing safety violations during online exploration in preliminary experiments by an average of 40.3% compared with QL and DQN standard algorithms and 29.1% compared with previous related work, while achieving comparable cumulative rewards with respect to unrestricted exploration and alternative approaches.
引用
收藏
页码:311 / 328
页数:18
相关论文
共 50 条
  • [31] Computational Missile Guidance: A Deep Reinforcement Learning Approach
    He, Shaoming
    Shin, Hyo-Sang
    Tsourdos, Antonios
    JOURNAL OF AEROSPACE INFORMATION SYSTEMS, 2021, 18 (08): : 571 - 582
  • [32] CLUE: Calibrated Latent Guidance for Offline Reinforcement Learning
    Liu, Jinxin
    Zu, Lipeng
    He, Li
    Wang, Donglin
    CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
  • [33] Deep Reinforcement Learning for Spacecraft Proximity Operations Guidance
    Hovell, Kirk
    Ulrich, Steve
    JOURNAL OF SPACECRAFT AND ROCKETS, 2021, 58 (02) : 254 - 264
  • [34] A dynamic route guidance arithmetic based on reinforcement learning
    Zhang, Z
    Xu, JM
    PROCEEDINGS OF 2005 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-9, 2005, : 3607 - 3611
  • [35] Leveraging Human Guidance for Deep Reinforcement Learning Tasks
    Zhang, Ruohan
    Torabi, Faraz
    Guan, Lin
    Ballard, Dana H.
    Stone, Peter
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 6339 - 6346
  • [36] OFFLINE REINFORCEMENT LEARNING WITH POLICY GUIDANCE AND UNCERTAINTY ESTIMATION
    Wu, Lan
    Liu, Quan
    Zhang, Lihua
    Huang, Zhigang
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 5010 - 5014
  • [37] Hierarchical Deep Reinforcement Learning for cubesat guidance and control
    Tammam, Abdulla
    Aouf, Nabil
    CONTROL ENGINEERING PRACTICE, 2025, 156
  • [38] Deep Reinforcement Learning Guidance with Impact Time Control
    Li, Guofei
    Li, Shituo
    Li, Bohao
    Wu, Yunjie
    JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS, 2024, 35 (06) : 1594 - 1603
  • [39] Intercept Guidance of Maneuvering Targets with Deep Reinforcement Learning
    Hu, Zhe
    Xiao, Liang
    Guan, Jun
    Yi, Wenjun
    Yin, Hongqiao
    INTERNATIONAL JOURNAL OF AEROSPACE ENGINEERING, 2023, 2023
  • [40] Autonomous Rendezvous Guidance via Deep Reinforcement Learning
    Wang, Xinyu
    Wang, Guohui
    Chen, Yi
    Xie, Yongfeng
    PROCEEDINGS OF THE 32ND 2020 CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2020), 2020, : 1848 - 1853