POMDP inference and robust solution via deep reinforcement learning: an application to railway optimal maintenance

被引:2
|
作者
Arcieri, Giacomo [1 ]
Hoelzl, Cyprien [1 ]
Schwery, Oliver [2 ]
Straub, Daniel [3 ]
Papakonstantinou, Konstantinos G. [4 ]
Chatzi, Eleni [1 ]
机构
[1] Swiss Fed Inst Technol, Inst Struct Engn, CH-8093 Zurich, Switzerland
[2] Swiss Fed Railways SBB, CH-3000 Bern, Switzerland
[3] Tech Univ Munich, Engn Risk Anal Grp, D-80333 Munich, Germany
[4] Penn State Univ, Dept Civil & Environm Engn, University Pk, PA 16802 USA
基金
美国国家科学基金会;
关键词
Partially observable Markov decision process; Reinforcement learning; Deep learning; Model uncertainty; Optimal maintenance; PLANNING STRUCTURAL INSPECTION; POLICIES;
D O I
10.1007/s10994-024-06559-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Partially Observable Markov Decision Processes (POMDPs) can model complex sequential decision-making problems under stochastic and uncertain environments. A main reason hindering their broad adoption in real-world applications is the unavailability of a suitable POMDP model or a simulator thereof. Available solution algorithms, such as Reinforcement Learning (RL), typically benefit from the knowledge of the transition dynamics and the observation generating process, which are often unknown and non-trivial to infer. In this work, we propose a combined framework for inference and robust solution of POMDPs via deep RL. First, all transition and observation model parameters are jointly inferred via Markov Chain Monte Carlo sampling of a hidden Markov model, which is conditioned on actions, in order to recover full posterior distributions from the available data. The POMDP with uncertain parameters is then solved via deep RL techniques with the parameter distributions incorporated into the solution via domain randomization, in order to develop solutions that are robust to model uncertainty. As a further contribution, we compare the use of Transformers and long short-term memory networks, which constitute model-free RL solutions and work directly on the observation space, with an approach termed the belief-input method, which works on the belief space by exploiting the learned POMDP model for belief inference. We apply these methods to the real-world problem of optimal maintenance planning for railway assets and compare the results with the current real-life policy. We show that the RL policy learned by the belief-input method is able to outperform the real-life policy by yielding significantly reduced life-cycle costs.
引用
收藏
页码:7967 / 7995
页数:29
相关论文
共 50 条
  • [21] Symbolic Task Inference in Deep Reinforcement Learning
    Hasanbeig, Hosein
    Jeppu, Natasha Yogananda
    Abate, Alessandro
    Melham, Tom
    Kroening, Daniel
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2024, 80 : 1099 - 1137
  • [22] Data Driven Solution to Market Equilibrium via Deep Reinforcement Learning
    Wen, Lin
    Wang, Jianxiao
    Lin, Li
    Zou, Yang
    Gao, Feng
    Hong, Qiteng
    2024 IEEE 2ND INTERNATIONAL CONFERENCE ON POWER SCIENCE AND TECHNOLOGY, ICPST 2024, 2024, : 1422 - 1426
  • [23] Application of deep reinforcement learning for extremely rare failure prediction in aircraft maintenance
    Dangut, Maren David
    Jennions, Ian K.
    King, Steve
    Skaf, Zakwan
    MECHANICAL SYSTEMS AND SIGNAL PROCESSING, 2022, 171
  • [24] ROMAX: Certifiably Robust Deep Multiagent Reinforcement Learning via Convex Relaxation
    Sun, Chuangchuang
    Kim, Dong-Ki
    How, Jonathan P.
    Proceedings - IEEE International Conference on Robotics and Automation, 2022, : 5503 - 5510
  • [25] ROMAX: Certifiably Robust Deep Multiagent Reinforcement Learning via Convex Relaxation
    Sun, Chuangchuang
    Kim, Dong-Ki
    How, Jonathan P.
    2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2022), 2022, : 5503 - 5510
  • [26] Opportunistic maintenance scheduling with deep reinforcement learning
    Valet, Alexander
    Altenmueller, Thomas
    Waschneck, Bernd
    May, Marvin Carl
    Kuhnle, Andreas
    Lanza, Gisela
    JOURNAL OF MANUFACTURING SYSTEMS, 2022, 64 : 518 - 534
  • [27] Multiagent Reinforcement Learning: Rollout and Policy Iteration for POMDP With Application to Multirobot Problems
    Bhattacharya, Sushmita
    Kailas, Siva
    Badyal, Sahil
    Gil, Stephanie
    Bertsekas, Dimitri
    IEEE TRANSACTIONS ON ROBOTICS, 2024, 40 : 2003 - 2023
  • [28] DreamWaQ: Learning Robust Quadrupedal Locomotion With Implicit Terrain Imagination via Deep Reinforcement Learning
    Nahrendra, I. Made Aswin
    Yu, Byeongho
    Myung, Hyun
    2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA, 2023, : 5078 - 5084
  • [29] Optimal wideband sequential sensing in cognitive radios via deep reinforcement learning
    Wu, Keyu
    Qian, Jing
    Liu, Shixuan
    ELECTRONICS LETTERS, 2023, 59 (08)
  • [30] Learning to Walk via Deep Reinforcement Learning
    Haarnoja, Tuomas
    Ha, Sehoon
    Zhou, Aurick
    Tan, Jie
    Tucker, George
    Levine, Sergey
    ROBOTICS: SCIENCE AND SYSTEMS XV, 2019,