Anomalous State Sequence Modeling to Enhance Safety in Reinforcement Learning

被引:0
|
作者
Kweider, Leen [1 ]
Abou Kassem, Maissa [1 ]
Sandouk, Ubai [2 ]
机构
[1] Damascus Univ, Fac Informat Technol, Dept Artificial Intelligence, Damascus, Syria
[2] Damascus Univ, Fac Informat Technol, Dept Software Engn, Damascus, Syria
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Safety; Anomaly detection; Reinforcement learning; Artificial intelligence; Optimization; Uncertainty; Measurement uncertainty; Costs; Decision making; Training; AI safety; reinforcement learning; anomaly detection; sequence modeling; risk-averse policy; reward shaping;
D O I
10.1109/ACCESS.2024.3486549
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The deployment of artificial intelligence (AI) in decision-making applications requires ensuring an appropriate level of safety and reliability, particularly in changing environments that contain a large number of unknown observations. To address this challenge, we propose a novel safe reinforcement learning (RL) approach that utilizes an anomalous state sequence to enhance RL safety. Our proposed solution Safe Reinforcement Learning with Anomalous State Sequences (AnoSeqs) consists of two stages. First, we train an agent in a non-safety-critical offline 'source' environment to collect safe state sequences. Next, we use these safe sequences to build an anomaly detection model that can detect potentially unsafe state sequences in a 'target' safety-critical environment where failures can have high costs. The estimated risk from the anomaly detection model is utilized to train a risk-averse RL policy in the target environment; this involves adjusting the reward function to penalize the agent for visiting anomalous states deemed unsafe by our anomaly model. In experiments on multiple safety-critical benchmarking environments including self-driving cars, our solution approach successfully learns safer policies and proves that sequential anomaly detection can provide an effective supervisory signal for training safety-aware RL agents.
引用
收藏
页码:157140 / 157148
页数:9
相关论文
共 50 条
  • [21] SEQUENCE-TO-SEQUENCE ASR OPTIMIZATION VIA REINFORCEMENT LEARNING
    Tjandra, Andros
    Sakti, Sakriani
    Nakamura, Satoshi
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5829 - 5833
  • [22] Sequence to Sequence Multi-agent Reinforcement Learning Algorithm
    Shi T.
    Wang L.
    Huang Z.
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2021, 34 (03): : 206 - 213
  • [23] Reconnaissance for Reinforcement Learning with Safety Constraints
    Maeda, Shin-ichi
    Watahiki, Hayato
    Ouyang, Yi
    Okada, Shintarou
    Koyama, Masanori
    Nagarajan, Prabhat
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2021: RESEARCH TRACK, PT II, 2021, 12976 : 567 - 582
  • [24] Almost surely safe exploration and exploitation for deep reinforcement learning with state safety estimation
    Lin, Ke
    Li, Yanjie
    Liu, Qi
    Li, Duantengchuan
    Shi, Xiongtao
    Chen, Shiyu
    INFORMATION SCIENCES, 2024, 662
  • [25] On Predicting Sensor Readings With Sequence Modeling and Reinforcement Learning for Energy-Efficient IoT Applications
    Laidi, Roufaida
    Djenouri, Djamel
    Balasingham, Ilangko
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2022, 52 (08): : 5140 - 5151
  • [26] Safety reinforcement learning control via transfer learning
    Zhang, Quanqi
    Wu, Chengwei
    Tian, Haoyu
    Gao, Yabin
    Yao, Weiran
    Wu, Ligang
    AUTOMATICA, 2024, 166
  • [27] Improving reinforcement learning by using sequence trees
    Girgin, Sertan
    Polat, Faruk
    Alhajj, Reda
    MACHINE LEARNING, 2010, 81 (03) : 283 - 331
  • [28] Sequence labeling with reinforcement learning and ranking algorithms
    Maes, Francis
    Denoyer, Ludovic
    Gallinari, Patrick
    MACHINE LEARNING: ECML 2007, PROCEEDINGS, 2007, 4701 : 648 - +
  • [29] Reinforcement Learning in Latent Action Sequence Space
    Kim, Heecheol
    Yamada, Masanori
    Miyoshi, Kosuke
    Iwata, Tomoharu
    Yamakawa, Hiroshi
    2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 5497 - 5503
  • [30] Reinforcement learning for disassembly sequence planning optimization
    Allagui, Amal
    Belhadj, Imen
    Plateaux, Regis
    Hammadi, Moncef
    Penas, Olivia
    Aifaoui, Nizar
    COMPUTERS IN INDUSTRY, 2023, 151