Anomalous State Sequence Modeling to Enhance Safety in Reinforcement Learning

被引:0
|
作者
Kweider, Leen [1 ]
Abou Kassem, Maissa [1 ]
Sandouk, Ubai [2 ]
机构
[1] Damascus Univ, Fac Informat Technol, Dept Artificial Intelligence, Damascus, Syria
[2] Damascus Univ, Fac Informat Technol, Dept Software Engn, Damascus, Syria
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Safety; Anomaly detection; Reinforcement learning; Artificial intelligence; Optimization; Uncertainty; Measurement uncertainty; Costs; Decision making; Training; AI safety; reinforcement learning; anomaly detection; sequence modeling; risk-averse policy; reward shaping;
D O I
10.1109/ACCESS.2024.3486549
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The deployment of artificial intelligence (AI) in decision-making applications requires ensuring an appropriate level of safety and reliability, particularly in changing environments that contain a large number of unknown observations. To address this challenge, we propose a novel safe reinforcement learning (RL) approach that utilizes an anomalous state sequence to enhance RL safety. Our proposed solution Safe Reinforcement Learning with Anomalous State Sequences (AnoSeqs) consists of two stages. First, we train an agent in a non-safety-critical offline 'source' environment to collect safe state sequences. Next, we use these safe sequences to build an anomaly detection model that can detect potentially unsafe state sequences in a 'target' safety-critical environment where failures can have high costs. The estimated risk from the anomaly detection model is utilized to train a risk-averse RL policy in the target environment; this involves adjusting the reward function to penalize the agent for visiting anomalous states deemed unsafe by our anomaly model. In experiments on multiple safety-critical benchmarking environments including self-driving cars, our solution approach successfully learns safer policies and proves that sequential anomaly detection can provide an effective supervisory signal for training safety-aware RL agents.
引用
收藏
页码:157140 / 157148
页数:9
相关论文
共 50 条
  • [31] Improving reinforcement learning by using sequence trees
    Sertan Girgin
    Faruk Polat
    Reda Alhajj
    Machine Learning, 2010, 81 : 283 - 331
  • [32] Reinforcement Learning for on-line Sequence Transformation
    Rypesc, Grzegorz
    Lepak, Lukasz
    Wawrzynski, Pawel
    PROCEEDINGS OF THE 2022 17TH CONFERENCE ON COMPUTER SCIENCE AND INTELLIGENCE SYSTEMS (FEDCSIS), 2022, : 133 - 139
  • [33] AlphaSeq: Sequence Discovery With Deep Reinforcement Learning
    Shao, Yulin
    Liew, Soung Chang
    Wang, Taotao
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (09) : 3319 - 3333
  • [34] A Reinforcement Learning Approach to Enhance the Trust Level of MANETs
    Jinarajadasa, Gihani
    Rupasinghe, Lakmal
    Murray, Iain
    2018 NATIONAL INFORMATION TECHNOLOGY CONFERENCE (NITC), 2018,
  • [35] Reinforcement learning of dynamic motor sequence: Learning to stand up
    Morimoto, J
    Doya, K
    1998 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS - PROCEEDINGS, VOLS 1-3: INNOVATIONS IN THEORY, PRACTICE AND APPLICATIONS, 1998, : 1721 - 1726
  • [36] Sleep does not enhance motor sequence learning
    Rickard, Timothy C.
    Cai, Denise J.
    Rieth, Cory A.
    Jones, Jason
    Ard, M. Colin
    JOURNAL OF EXPERIMENTAL PSYCHOLOGY-LEARNING MEMORY AND COGNITION, 2008, 34 (04) : 834 - 842
  • [37] Deep Sequence Modeling for Anomalous ISP Traffic Prediction
    Saha, Sajal
    Hague, Anwar
    Sidebottom, Greg
    IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2022), 2022, : 5439 - 5444
  • [38] Opponent Modeling in Deep Reinforcement Learning
    He, He
    Boyd-Graber, Jordan
    Kwok, Kevin
    Daume, Hal, III
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [39] A modeling environment for reinforcement learning in games
    Gomes, Gilzamir
    Vidal, Creto A.
    Cavalcante-Neto, Joaquim B.
    Nogueira, Yuri L. B.
    ENTERTAINMENT COMPUTING, 2022, 43
  • [40] Safety-constrained reinforcement learning with a distributional safety critic
    Qisong Yang
    Thiago D. Simão
    Simon H. Tindemans
    Matthijs T. J. Spaan
    Machine Learning, 2023, 112 : 859 - 887