Towards safe reinforcement-learning in industrial grid-warehousing

被引：20

作者：

Andersen, Per-Arne ^{[1
]}

Goodwin, Morten ^{[1
]}

Granmo, Ole-Christoffer ^{[1
]}

机构：

[1] Univ Agder, Dept ICT, Jon Lilletuns Vei 9, N-4879 Grimstad, Norway

来源：

INFORMATION SCIENCES | 2020年 / 537卷

关键词：

Model-based reinforcement learning; Neural networks; Variational autoencoder; Markov decision processes; Exploration; Safe reinforcement learning; ENVIRONMENT; EXPLORATION; ALGORITHMS;

D O I：

10.1016/j.ins.2020.06.010

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Reinforcement learning has shown to be profoundly successful at learning optimal policies for simulated environments using distributed training with extensive compute capacity. Model-free reinforcement learning uses the notion of trial and error, where the error is a vital part of learning the agent to behave optimally. In mission-critical, real-world environments, there is little tolerance for failure and can cause damaging effects on humans and equipment. In these environments, current state-of-the-art reinforcement learning approaches are not sufficient to learn optimal control policies safely. On the other hand, model-based reinforcement learning tries to encode environment transition dynamics into a predictive model. The transition dynamics describes the mapping from one state to another, conditioned on an action. If this model is accurate enough, the predictive model is sufficient to train agents for optimal behavior in real environments. This paper presents the Dreaming Variational Autoencoder (DVAE) for safely learning good policies with a significantly lower risk of catastrophes occurring during training. The algorithm combines variational autoencoders, risk-directed exploration, and curiosity to train deep-q networks inside "dream" states. We introduce a novel environment, ASRS-Lab, for research in the safe learning of autonomous vehicles in grid-based warehousing. The work shows that the proposed algorithm has better sample efficiency with similar performance to novel model-free deep reinforcement learning algorithms while maintaining safety during training. (C) 2020 The Author(s). Published by Elsevier Inc.

引用

页码：467 / 484

页数：18

共 50 条

[21] Reinforcement-Learning Based Preload Strategy for Short Video
Ren, Zhicheng
Shan, Yongxin
Jiang, Wanchun
Shan, Yijing
Shan, Danfeng
Wang, Jianxin
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT V, 2023, 14090 : 327 - 339
[22] Modulating Reinforcement-Learning Parameters using Agent Emotions
von Haugwitz, Rickard
Kitamura, Yoshifumi
Takashima, Kazuki
6TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS, AND THE 13TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS, 2012, : 1281 - 1285
[23] Enforcing ethical goals over reinforcement-learning policies
Emery A. Neufeld
Ezio Bartocci
Agata Ciabattoni
Guido Governatori
Ethics and Information Technology, 2022, 24
[24] Orbitofrontal Circuits Control Multiple Reinforcement-Learning Processes
Groman, Stephanie M.
Keistler, Colby
Keip, Alex J.
Hammarlund, Emma
DiLeone, Ralph J.
Pittenger, Christopher
Lee, Daeyeol
Taylor, Jane R.
NEURON, 2019, 103 (04) : 734 - +
[25] Enforcing ethical goals over reinforcement-learning policies
Neufeld, Emery A.
Bartocci, Ezio
Ciabattoni, Agata
Governatori, Guido
ETHICS AND INFORMATION TECHNOLOGY, 2022, 24 (04)
[26] A Reinforcement-Learning Style Algorithm for Black Box Automata
Cohen, Itay
Fogler, Roi
Peled, Doron
2022 20TH ACM-IEEE INTERNATIONAL CONFERENCE ON FORMAL METHODS AND MODELS FOR SYSTEM DESIGN (MEMOCODE), 2022,
[27] ASQ-IT: Interactive explanations for reinforcement-learning agents
Amitai, Yotam
Amir, Ofra
Avni, Guy
ARTIFICIAL INTELLIGENCE, 2024, 335
[28] Reinforcement-Learning Based Fault-Tolerant Control
Zhang, Dapeng
Lin, Zhiling
Gao, Zhiwei
2017 IEEE 15TH INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2017, : 671 - 676
[29] A reinforcement-learning approach to failure-detection scheduling
Zeng, Fancong
USIC 2007: PROCEEDINGS OF THE SEVENTH INTERNATIONAL CONFERENCE ON QUALITY SOFTWARE, 2007, : 161 - 170
[30] Towards a Multiple-Lookahead-Levels agent reinforcement-learning technique and its implementation in integrated circuits
H. S. Al-Dayaa
D. B. Megherbi
The Journal of Supercomputing, 2012, 62 : 588 - 615

← 1 2 3 4 5 →