Towards safe reinforcement-learning in industrial grid-warehousing

被引:20
|
作者
Andersen, Per-Arne [1 ]
Goodwin, Morten [1 ]
Granmo, Ole-Christoffer [1 ]
机构
[1] Univ Agder, Dept ICT, Jon Lilletuns Vei 9, N-4879 Grimstad, Norway
关键词
Model-based reinforcement learning; Neural networks; Variational autoencoder; Markov decision processes; Exploration; Safe reinforcement learning; ENVIRONMENT; EXPLORATION; ALGORITHMS;
D O I
10.1016/j.ins.2020.06.010
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Reinforcement learning has shown to be profoundly successful at learning optimal policies for simulated environments using distributed training with extensive compute capacity. Model-free reinforcement learning uses the notion of trial and error, where the error is a vital part of learning the agent to behave optimally. In mission-critical, real-world environments, there is little tolerance for failure and can cause damaging effects on humans and equipment. In these environments, current state-of-the-art reinforcement learning approaches are not sufficient to learn optimal control policies safely. On the other hand, model-based reinforcement learning tries to encode environment transition dynamics into a predictive model. The transition dynamics describes the mapping from one state to another, conditioned on an action. If this model is accurate enough, the predictive model is sufficient to train agents for optimal behavior in real environments. This paper presents the Dreaming Variational Autoencoder (DVAE) for safely learning good policies with a significantly lower risk of catastrophes occurring during training. The algorithm combines variational autoencoders, risk-directed exploration, and curiosity to train deep-q networks inside "dream" states. We introduce a novel environment, ASRS-Lab, for research in the safe learning of autonomous vehicles in grid-based warehousing. The work shows that the proposed algorithm has better sample efficiency with similar performance to novel model-free deep reinforcement learning algorithms while maintaining safety during training. (C) 2020 The Author(s). Published by Elsevier Inc.
引用
收藏
页码:467 / 484
页数:18
相关论文
共 50 条
  • [41] IEEE 802.15.4 differentiated service strategy based on reinforcement-learning
    College of Communication Engineering, Jilin University, Changchun
    130012, China
    Tongxin Xuebao, 8
  • [42] Efficient reinforcement-learning control algorithm using experience reuse
    Hao, Chuan-Chuan
    Fang, Zhou
    Li, Ping
    Huanan Ligong Daxue Xuebao/Journal of South China University of Technology (Natural Science), 2012, 40 (06): : 70 - 75
  • [43] Computably Continuous Reinforcement-Learning Objectives Are PAC-Learnable
    Yang, Cambridge
    Littman, Michael
    Carbin, Michael
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 9, 2023, : 10729 - 10736
  • [44] A Reinforcement-Learning Algorithm for Sampling Design in Markov Random Fields
    Bonneau, Mathieu
    Peyrard, Nathalie
    Sabbadin, Regis
    20TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2012), 2012, 242 : 181 - 186
  • [45] Identifying optimal cycles in quantum thermal machines with reinforcement-learning
    Erdman, Paolo A.
    Noe, Frank
    NPJ QUANTUM INFORMATION, 2022, 8 (01)
  • [46] Evolving internal reinforcers for an intrinsically motivated reinforcement-learning robot
    Schembri, Massimiliano
    Mirolli, Marco
    Baldassarre, Gianluca
    2007 IEEE 6TH INTERNATIONAL CONFERENCE ON DEVELOPMENT AND LEARNING, 2007, : 175 - 180
  • [47] Identifying optimal cycles in quantum thermal machines with reinforcement-learning
    Paolo A. Erdman
    Frank Noé
    npj Quantum Information, 8
  • [48] A Reinforcement-Learning Based Cognitive Scheme for Opportunistic Spectrum Access
    Angeliki V. Kordali
    Panayotis G. Cottis
    Wireless Personal Communications, 2016, 86 : 751 - 769
  • [49] Reinforcement-learning generation of four-qubit entangled states
    Giordano, Sara
    Martin-Delgado, Miguel A.
    PHYSICAL REVIEW RESEARCH, 2022, 4 (04):
  • [50] High-intensity interval exercise impairs neuroelectric indices of reinforcement-learning
    Walsh, Jeremy J.
    Colino, Francisco L.
    Krigolson, Olave E.
    Luehr, Stephen
    Gurd, Brendon J.
    Tschakovsky, Michael E.
    PHYSIOLOGY & BEHAVIOR, 2019, 198 : 18 - 26