Towards safe reinforcement-learning in industrial grid-warehousing

被引:20
|
作者
Andersen, Per-Arne [1 ]
Goodwin, Morten [1 ]
Granmo, Ole-Christoffer [1 ]
机构
[1] Univ Agder, Dept ICT, Jon Lilletuns Vei 9, N-4879 Grimstad, Norway
关键词
Model-based reinforcement learning; Neural networks; Variational autoencoder; Markov decision processes; Exploration; Safe reinforcement learning; ENVIRONMENT; EXPLORATION; ALGORITHMS;
D O I
10.1016/j.ins.2020.06.010
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Reinforcement learning has shown to be profoundly successful at learning optimal policies for simulated environments using distributed training with extensive compute capacity. Model-free reinforcement learning uses the notion of trial and error, where the error is a vital part of learning the agent to behave optimally. In mission-critical, real-world environments, there is little tolerance for failure and can cause damaging effects on humans and equipment. In these environments, current state-of-the-art reinforcement learning approaches are not sufficient to learn optimal control policies safely. On the other hand, model-based reinforcement learning tries to encode environment transition dynamics into a predictive model. The transition dynamics describes the mapping from one state to another, conditioned on an action. If this model is accurate enough, the predictive model is sufficient to train agents for optimal behavior in real environments. This paper presents the Dreaming Variational Autoencoder (DVAE) for safely learning good policies with a significantly lower risk of catastrophes occurring during training. The algorithm combines variational autoencoders, risk-directed exploration, and curiosity to train deep-q networks inside "dream" states. We introduce a novel environment, ASRS-Lab, for research in the safe learning of autonomous vehicles in grid-based warehousing. The work shows that the proposed algorithm has better sample efficiency with similar performance to novel model-free deep reinforcement learning algorithms while maintaining safety during training. (C) 2020 The Author(s). Published by Elsevier Inc.
引用
收藏
页码:467 / 484
页数:18
相关论文
共 50 条
  • [21] Reinforcement-Learning Based Preload Strategy for Short Video
    Ren, Zhicheng
    Shan, Yongxin
    Jiang, Wanchun
    Shan, Yijing
    Shan, Danfeng
    Wang, Jianxin
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT V, 2023, 14090 : 327 - 339
  • [22] Modulating Reinforcement-Learning Parameters using Agent Emotions
    von Haugwitz, Rickard
    Kitamura, Yoshifumi
    Takashima, Kazuki
    6TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS, AND THE 13TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS, 2012, : 1281 - 1285
  • [23] Enforcing ethical goals over reinforcement-learning policies
    Emery A. Neufeld
    Ezio Bartocci
    Agata Ciabattoni
    Guido Governatori
    Ethics and Information Technology, 2022, 24
  • [24] Orbitofrontal Circuits Control Multiple Reinforcement-Learning Processes
    Groman, Stephanie M.
    Keistler, Colby
    Keip, Alex J.
    Hammarlund, Emma
    DiLeone, Ralph J.
    Pittenger, Christopher
    Lee, Daeyeol
    Taylor, Jane R.
    NEURON, 2019, 103 (04) : 734 - +
  • [25] Enforcing ethical goals over reinforcement-learning policies
    Neufeld, Emery A.
    Bartocci, Ezio
    Ciabattoni, Agata
    Governatori, Guido
    ETHICS AND INFORMATION TECHNOLOGY, 2022, 24 (04)
  • [26] A Reinforcement-Learning Style Algorithm for Black Box Automata
    Cohen, Itay
    Fogler, Roi
    Peled, Doron
    2022 20TH ACM-IEEE INTERNATIONAL CONFERENCE ON FORMAL METHODS AND MODELS FOR SYSTEM DESIGN (MEMOCODE), 2022,
  • [27] ASQ-IT: Interactive explanations for reinforcement-learning agents
    Amitai, Yotam
    Amir, Ofra
    Avni, Guy
    ARTIFICIAL INTELLIGENCE, 2024, 335
  • [28] Reinforcement-Learning Based Fault-Tolerant Control
    Zhang, Dapeng
    Lin, Zhiling
    Gao, Zhiwei
    2017 IEEE 15TH INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2017, : 671 - 676
  • [29] A reinforcement-learning approach to failure-detection scheduling
    Zeng, Fancong
    USIC 2007: PROCEEDINGS OF THE SEVENTH INTERNATIONAL CONFERENCE ON QUALITY SOFTWARE, 2007, : 161 - 170
  • [30] Towards a Multiple-Lookahead-Levels agent reinforcement-learning technique and its implementation in integrated circuits
    H. S. Al-Dayaa
    D. B. Megherbi
    The Journal of Supercomputing, 2012, 62 : 588 - 615