Towards safe reinforcement-learning in industrial grid-warehousing

被引:20
|
作者
Andersen, Per-Arne [1 ]
Goodwin, Morten [1 ]
Granmo, Ole-Christoffer [1 ]
机构
[1] Univ Agder, Dept ICT, Jon Lilletuns Vei 9, N-4879 Grimstad, Norway
关键词
Model-based reinforcement learning; Neural networks; Variational autoencoder; Markov decision processes; Exploration; Safe reinforcement learning; ENVIRONMENT; EXPLORATION; ALGORITHMS;
D O I
10.1016/j.ins.2020.06.010
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Reinforcement learning has shown to be profoundly successful at learning optimal policies for simulated environments using distributed training with extensive compute capacity. Model-free reinforcement learning uses the notion of trial and error, where the error is a vital part of learning the agent to behave optimally. In mission-critical, real-world environments, there is little tolerance for failure and can cause damaging effects on humans and equipment. In these environments, current state-of-the-art reinforcement learning approaches are not sufficient to learn optimal control policies safely. On the other hand, model-based reinforcement learning tries to encode environment transition dynamics into a predictive model. The transition dynamics describes the mapping from one state to another, conditioned on an action. If this model is accurate enough, the predictive model is sufficient to train agents for optimal behavior in real environments. This paper presents the Dreaming Variational Autoencoder (DVAE) for safely learning good policies with a significantly lower risk of catastrophes occurring during training. The algorithm combines variational autoencoders, risk-directed exploration, and curiosity to train deep-q networks inside "dream" states. We introduce a novel environment, ASRS-Lab, for research in the safe learning of autonomous vehicles in grid-based warehousing. The work shows that the proposed algorithm has better sample efficiency with similar performance to novel model-free deep reinforcement learning algorithms while maintaining safety during training. (C) 2020 The Author(s). Published by Elsevier Inc.
引用
收藏
页码:467 / 484
页数:18
相关论文
共 50 条
  • [1] Towards Safe Continuing Task Reinforcement Learning
    Calvo-Fullana, Miguel
    Chamon, Luiz F. O.
    Paternain, Santiago
    2021 AMERICAN CONTROL CONFERENCE (ACC), 2021, : 902 - 908
  • [2] A reinforcement-learning approach to efficient communication
    Kageback, Mikael
    Carlsson, Emil
    Dubhashi, Devdatt
    Sayeed, Asad
    PLOS ONE, 2020, 15 (07):
  • [3] Coevolutionary networks of reinforcement-learning agents
    Kianercy, Ardeshir
    Galstyan, Aram
    PHYSICAL REVIEW E, 2013, 88 (01):
  • [4] A reinforcement-learning approach to color quantization
    Chou, CH
    Su, MC
    Chang, F
    Lai, E
    Proceedings of the Sixth IASTED International Conference on Intelligent Systems and Control, 2004, : 94 - 99
  • [5] A reinforcement-learning account of Tourette syndrome
    Maia, T.
    EUROPEAN PSYCHIATRY, 2017, 41 : S10 - S10
  • [6] A Reinforcement-Learning Approach to Color Quantization
    Chou, Chien-Hsing
    Su, Mu-Chun
    Zhao, Yu-Xiang
    Hsu, Fu-Hau
    JOURNAL OF APPLIED SCIENCE AND ENGINEERING, 2011, 14 (02): : 141 - 150
  • [7] Reinforcement-learning in fronto-striatal circuits
    Bruno Averbeck
    John P. O’Doherty
    Neuropsychopharmacology, 2022, 47 : 147 - 162
  • [8] Modeling Biological Agents Beyond the Reinforcement-Learning Paradigm
    Georgeon, Olivier L.
    Casado, Remi C.
    Matignon, Laetitia A.
    6TH ANNUAL INTERNATIONAL CONFERENCE ON BIOLOGICALLY INSPIRED COGNITIVE ARCHITECTURES (BICA 2015), 2015, 71 : 17 - 22
  • [9] Towards Safe Reinforcement Learning with a Safety Editor Policy
    Yu, Haonan
    Xu, Wei
    Zhang, Haichao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [10] Linking Confidence Biases to Reinforcement-Learning Processes
    Salem-Garcia, Nahuel
    Palminteri, Stefano
    Lebreton, Mael
    PSYCHOLOGICAL REVIEW, 2023, 130 (04) : 1017 - 1043