Probabilistic Counterexample Guidance for Safer Reinforcement Learning

被引：1

作者：

Ji, Xiaotong ^{[1
]}

Filieri, Antonio ^{[1
]}

机构：

[1] Imperial Coll London, Dept Comp, London SW7 2AZ, England

来源：

QUANTITATIVE EVALUATION OF SYSTEMS, QEST 2023 | 2023年 / 14287卷

关键词：

Safe reinforcement learning; Probabilistic model checking; Counterexample guidance;

D O I：

10.1007/978-3-031-43835-6_22

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Safe exploration aims at addressing the limitations of Reinforcement Learning (RL) in safety-critical scenarios, where failures during trial-and-error learning may incur high costs. Several methods exist to incorporate external knowledge or to use proximal sensor data to limit the exploration of unsafe states. However, reducing exploration risks in unknown environments, where an agent must discover safety threats during exploration, remains challenging. In this paper, we target the problem of safe exploration by guiding the training with counterexamples of the safety requirement. Our method abstracts both continuous and discrete state-space systems into compact abstract models representing the safety-relevant knowledge acquired by the agent during exploration. We then exploit probabilistic counter-example generation to construct minimal simulation submodels eliciting safety requirement violations, where the agent can efficiently train offline to refine its policy towards minimising the risk of safety violations during the subsequent online exploration. We demonstrate our method's effectiveness in reducing safety violations during online exploration in preliminary experiments by an average of 40.3% compared with QL and DQN standard algorithms and 29.1% compared with previous related work, while achieving comparable cumulative rewards with respect to unrestricted exploration and alternative approaches.

引用

页码：311 / 328

页数：18

共 50 条

[31] Computational Missile Guidance: A Deep Reinforcement Learning Approach
He, Shaoming
Shin, Hyo-Sang
Tsourdos, Antonios
JOURNAL OF AEROSPACE INFORMATION SYSTEMS, 2021, 18 (08): : 571 - 582
[32] CLUE: Calibrated Latent Guidance for Offline Reinforcement Learning
Liu, Jinxin
Zu, Lipeng
He, Li
Wang, Donglin
CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
[33] Deep Reinforcement Learning for Spacecraft Proximity Operations Guidance
Hovell, Kirk
Ulrich, Steve
JOURNAL OF SPACECRAFT AND ROCKETS, 2021, 58 (02) : 254 - 264
[34] A dynamic route guidance arithmetic based on reinforcement learning
Zhang, Z
Xu, JM
PROCEEDINGS OF 2005 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-9, 2005, : 3607 - 3611
[35] Leveraging Human Guidance for Deep Reinforcement Learning Tasks
Zhang, Ruohan
Torabi, Faraz
Guan, Lin
Ballard, Dana H.
Stone, Peter
PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 6339 - 6346
[36] OFFLINE REINFORCEMENT LEARNING WITH POLICY GUIDANCE AND UNCERTAINTY ESTIMATION
Wu, Lan
Liu, Quan
Zhang, Lihua
Huang, Zhigang
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 5010 - 5014
[37] Hierarchical Deep Reinforcement Learning for cubesat guidance and control
Tammam, Abdulla
Aouf, Nabil
CONTROL ENGINEERING PRACTICE, 2025, 156
[38] Deep Reinforcement Learning Guidance with Impact Time Control
Li, Guofei
Li, Shituo
Li, Bohao
Wu, Yunjie
JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS, 2024, 35 (06) : 1594 - 1603
[39] Intercept Guidance of Maneuvering Targets with Deep Reinforcement Learning
Hu, Zhe
Xiao, Liang
Guan, Jun
Yi, Wenjun
Yin, Hongqiao
INTERNATIONAL JOURNAL OF AEROSPACE ENGINEERING, 2023, 2023
[40] Autonomous Rendezvous Guidance via Deep Reinforcement Learning
Wang, Xinyu
Wang, Guohui
Chen, Yi
Xie, Yongfeng
PROCEEDINGS OF THE 32ND 2020 CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2020), 2020, : 1848 - 1853

← 1 2 3 4 5 →