Towards safe reinforcement-learning in industrial grid-warehousing

被引:20
|
作者
Andersen, Per-Arne [1 ]
Goodwin, Morten [1 ]
Granmo, Ole-Christoffer [1 ]
机构
[1] Univ Agder, Dept ICT, Jon Lilletuns Vei 9, N-4879 Grimstad, Norway
关键词
Model-based reinforcement learning; Neural networks; Variational autoencoder; Markov decision processes; Exploration; Safe reinforcement learning; ENVIRONMENT; EXPLORATION; ALGORITHMS;
D O I
10.1016/j.ins.2020.06.010
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Reinforcement learning has shown to be profoundly successful at learning optimal policies for simulated environments using distributed training with extensive compute capacity. Model-free reinforcement learning uses the notion of trial and error, where the error is a vital part of learning the agent to behave optimally. In mission-critical, real-world environments, there is little tolerance for failure and can cause damaging effects on humans and equipment. In these environments, current state-of-the-art reinforcement learning approaches are not sufficient to learn optimal control policies safely. On the other hand, model-based reinforcement learning tries to encode environment transition dynamics into a predictive model. The transition dynamics describes the mapping from one state to another, conditioned on an action. If this model is accurate enough, the predictive model is sufficient to train agents for optimal behavior in real environments. This paper presents the Dreaming Variational Autoencoder (DVAE) for safely learning good policies with a significantly lower risk of catastrophes occurring during training. The algorithm combines variational autoencoders, risk-directed exploration, and curiosity to train deep-q networks inside "dream" states. We introduce a novel environment, ASRS-Lab, for research in the safe learning of autonomous vehicles in grid-based warehousing. The work shows that the proposed algorithm has better sample efficiency with similar performance to novel model-free deep reinforcement learning algorithms while maintaining safety during training. (C) 2020 The Author(s). Published by Elsevier Inc.
引用
收藏
页码:467 / 484
页数:18
相关论文
共 50 条
  • [31] Grid clustering and fuzzy reinforcement-learning based energy-efficient data aggregation scheme for distributed WSN
    Sanjay Gandhi, Gundabatini
    Vikas, K.
    Ratnam, Vijayananda
    Suresh Babu, Kolluru
    IET COMMUNICATIONS, 2020, 14 (16) : 2840 - 2848
  • [32] Towards a Multiple-Lookahead-Levels agent reinforcement-learning technique and its implementation in integrated circuits
    Al-Dayaa, H. S.
    Megherbi, D. B.
    JOURNAL OF SUPERCOMPUTING, 2012, 62 (01): : 588 - 615
  • [33] Cyber-Physical Risk Driven Routing Planning with Deep Reinforcement-Learning in Smart Grid Communication Networks
    Jin, Zhuojun
    Yu, Peng
    Guo, ShaoYong
    Feng, Lei
    Zhou, Fanqin
    Tao, Minxing
    Li, Wenjing
    Qiu, Song
    Shi, Lei
    2020 16TH INTERNATIONAL WIRELESS COMMUNICATIONS & MOBILE COMPUTING CONFERENCE, IWCMC, 2020, : 1278 - 1283
  • [34] Towards Online Continuous Reinforcement Learning on Industrial Internet of Things
    Qian, Cheng
    Yu, Wei
    Liu, Xing
    Griffith, David
    Golmie, Nada
    2021 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, INTERNET OF PEOPLE, AND SMART CITY INNOVATIONS (SMARTWORLD/SCALCOM/UIC/ATC/IOP/SCI 2021), 2021, : 280 - 287
  • [35] A Reinforcement-Learning Based Cognitive Scheme for Opportunistic Spectrum Access
    Kordali, Angeliki V.
    Cottis, Panayotis G.
    WIRELESS PERSONAL COMMUNICATIONS, 2016, 86 (02) : 751 - 769
  • [36] BUILDING AN ARTIFICIAL STOCK MARKET POPULATED BY REINFORCEMENT-LEARNING AGENTS
    Rutkauskas, Aleksandras Vytautas
    Ramanauskas, Tomas
    JOURNAL OF BUSINESS ECONOMICS AND MANAGEMENT, 2009, 10 (04) : 329 - 341
  • [37] Evolution of cooperation on reinforcement-learning driven-adaptive networks
    Du, Chunpeng
    Lu, Yikang
    Meng, Haoran
    Park, Junpyo
    CHAOS, 2024, 34 (04)
  • [38] An intelligent controller based on fuzzy target acquired by reinforcement-learning
    Yasunobu, Seiji
    2006 SICE-ICASE INTERNATIONAL JOINT CONFERENCE, VOLS 1-13, 2006, : 94 - 99
  • [39] Enhancing stochastic resonance using a reinforcement-learning based method
    Ding, Jianpeng
    Lei, Youming
    JOURNAL OF VIBRATION AND CONTROL, 2023, 29 (7-8) : 1461 - 1471
  • [40] On Normative Reinforcement Learning via Safe Reinforcement Learning
    Neufeld, Emery A.
    Bartocci, Ezio
    Ciabattoni, Agata
    PRIMA 2022: PRINCIPLES AND PRACTICE OF MULTI-AGENT SYSTEMS, 2023, 13753 : 72 - 89