POMDP inference and robust solution via deep reinforcement learning: an application to railway optimal maintenance

被引:2
|
作者
Arcieri, Giacomo [1 ]
Hoelzl, Cyprien [1 ]
Schwery, Oliver [2 ]
Straub, Daniel [3 ]
Papakonstantinou, Konstantinos G. [4 ]
Chatzi, Eleni [1 ]
机构
[1] Swiss Fed Inst Technol, Inst Struct Engn, CH-8093 Zurich, Switzerland
[2] Swiss Fed Railways SBB, CH-3000 Bern, Switzerland
[3] Tech Univ Munich, Engn Risk Anal Grp, D-80333 Munich, Germany
[4] Penn State Univ, Dept Civil & Environm Engn, University Pk, PA 16802 USA
基金
美国国家科学基金会;
关键词
Partially observable Markov decision process; Reinforcement learning; Deep learning; Model uncertainty; Optimal maintenance; PLANNING STRUCTURAL INSPECTION; POLICIES;
D O I
10.1007/s10994-024-06559-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Partially Observable Markov Decision Processes (POMDPs) can model complex sequential decision-making problems under stochastic and uncertain environments. A main reason hindering their broad adoption in real-world applications is the unavailability of a suitable POMDP model or a simulator thereof. Available solution algorithms, such as Reinforcement Learning (RL), typically benefit from the knowledge of the transition dynamics and the observation generating process, which are often unknown and non-trivial to infer. In this work, we propose a combined framework for inference and robust solution of POMDPs via deep RL. First, all transition and observation model parameters are jointly inferred via Markov Chain Monte Carlo sampling of a hidden Markov model, which is conditioned on actions, in order to recover full posterior distributions from the available data. The POMDP with uncertain parameters is then solved via deep RL techniques with the parameter distributions incorporated into the solution via domain randomization, in order to develop solutions that are robust to model uncertainty. As a further contribution, we compare the use of Transformers and long short-term memory networks, which constitute model-free RL solutions and work directly on the observation space, with an approach termed the belief-input method, which works on the belief space by exploiting the learned POMDP model for belief inference. We apply these methods to the real-world problem of optimal maintenance planning for railway assets and compare the results with the current real-life policy. We show that the RL policy learned by the belief-input method is able to outperform the real-life policy by yielding significantly reduced life-cycle costs.
引用
收藏
页码:7967 / 7995
页数:29
相关论文
共 50 条
  • [1] Optimal policy for structure maintenance: A deep reinforcement learning framework
    Wei, Shiyin
    Bao, Yuequan
    Li, Hui
    STRUCTURAL SAFETY, 2020, 83 (83)
  • [2] Robust quadruped jumping via deep reinforcement learning
    Bellegarda, Guillaume
    Nguyen, Chuong
    Nguyen, Quan
    ROBOTICS AND AUTONOMOUS SYSTEMS, 2024, 182
  • [3] Deep reinforcement learning for optimal planning of assembly line maintenance
    Geurtsen, M.
    Adan, I.
    Atan, Z.
    JOURNAL OF MANUFACTURING SYSTEMS, 2023, 69 : 170 - 188
  • [4] Learning Representations via a Robust Behavioral Metric for Deep Reinforcement Learning
    Chen, Jianda
    Pan, Sinno Jialin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [5] Deep reinforcement learning for the olfactory search POMDP: a quantitative benchmark
    Aurore Loisy
    Robin A. Heinonen
    The European Physical Journal E, 2023, 46
  • [6] Deep reinforcement learning for the olfactory search POMDP: a quantitative benchmark
    Loisy, Aurore
    Heinonen, Robin A. A.
    EUROPEAN PHYSICAL JOURNAL E, 2023, 46 (03):
  • [7] Robust Deep Reinforcement Learning Scheduling via Weight Anchoring
    Gracla, Steffen
    Beck, Edgar
    Bockelmann, Carsten
    Dekorsy, Armin
    IEEE COMMUNICATIONS LETTERS, 2023, 27 (01) : 210 - 213
  • [8] Spatiotemporal Costmap Inference for MPC Via Deep Inverse Reinforcement Learning
    Lee, Keuntaek
    Isele, David
    Theodorou, Evangelos A.
    Bae, Sangjae
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (02) : 3194 - 3201
  • [9] A Deep Reinforcement Learning Approach for Optimal Scheduling of Heavy-haul Railway
    Wu, Tao
    Dong, Wei
    Ye, Hao
    Sun, Xinya
    Ji, Yindong
    IFAC PAPERSONLINE, 2023, 56 (02): : 3491 - 3497
  • [10] Optimal Automatic Train Operation Via Deep Reinforcement Learning
    Zhou, Rui
    Song, Shiji
    PROCEEDINGS OF 2018 TENTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTATIONAL INTELLIGENCE (ICACI), 2018, : 103 - 108