Complexity Bounds for Deterministic Partially Observed Markov Decision Processes

被引:0
|
作者
Vessaire, Cyrille [1 ]
Carpentier, Pierre [2 ]
Chancelier, Jean-Philippe [1 ]
De Lara, Michel [1 ]
Rodriguez-Martinez, Alejandro [3 ]
机构
[1] Ecole Ponts ParisTech, CERMICS, 6 & 8 Ave Blaise Pascal, F-77455 Marne La Vallee, France
[2] IP Paris, UMA, ENSTA Paris, 828 Blvd Marechaux, F-91762 Palaiseau, France
[3] TotalEnergies SE, Pau, France
关键词
D O I
10.1007/s10479-024-06282-0
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
Partially Observed Markov Decision Processes (Pomdp) share the structure of Markov Decision Processs (Mdp) - with stages, states, actions, probability transitions, rewards - but for the notion of solutions. In a Pomdp, observation mappings provide partial and/or imperfect knowledge of the state, and a policy maps observations (and not states like in a Mdp) towards actions. Theroretically, a Pomdp can be solved by Dynamic Programming (DP), but with an information state made of probability distributions over the original state, hence DP suffers from the curse of dimensionality, even in the finite case. This is why, authors like (Littman, M. L. 1996). Algorithms for Sequential Decision Making. PhD thesis, Brown University) and (Bonet, B. 2009). Deterministic POMDPs revisited. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI '09 (pp. 59-66). Arlington, Virginia, USA. AUAI Press) have studied the subclass of so-called Deterministic Partially Observed Markov Decision Processes (Det-Pomdp), where transitions and observations mappings are deterministic. In this paper, we improve on Littman's complexity bounds. We then introduce and study a more restricted class, Separated Det-Pomdps, and give some new complexity bounds for this class.
引用
收藏
页码:345 / 382
页数:38
相关论文
共 50 条
  • [41] A note on deterministic approximation of discounted Markov decision processes
    Cruz-Suarez, Hugo
    Gordienko, Evgueni
    Montes-de-Oca, Raul
    APPLIED MATHEMATICS LETTERS, 2009, 22 (08) : 1252 - 1256
  • [42] A useful technique for piecewise deterministic Markov decision processes
    Guo, Xin
    Zhang, Yi
    OPERATIONS RESEARCH LETTERS, 2021, 49 (01) : 55 - 61
  • [43] The complexity of reachability in parametric Markov decision processes
    Junges, Sebastian
    Katoen, Joost-Pieter
    Perez, Guillermo A.
    Winkler, Tobias
    JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2021, 119 : 183 - 210
  • [44] The complexity of decentralized control of Markov decision processes
    Bernstein, DS
    Givan, R
    Immerman, N
    Zilberstein, S
    MATHEMATICS OF OPERATIONS RESEARCH, 2002, 27 (04) : 819 - 840
  • [45] Error bounds for deterministic approximations to Markov processes, with applications to epidemic models
    Gani, J
    Yakowitz, S
    JOURNAL OF APPLIED PROBABILITY, 1995, 32 (04) : 1063 - 1076
  • [46] Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in Partially Observed Markov Decision Processes
    Bennett, Andrew
    Kallus, Nathan
    OPERATIONS RESEARCH, 2024, 72 (03) : 1071 - 1086
  • [47] Active learning in partially observable Markov decision processes
    Jaulmes, R
    Pineau, J
    Precup, D
    MACHINE LEARNING: ECML 2005, PROCEEDINGS, 2005, 3720 : 601 - 608
  • [48] Partially Observable Markov Decision Processes in Robotics: A Survey
    Lauri, Mikko
    Hsu, David
    Pajarinen, Joni
    IEEE TRANSACTIONS ON ROBOTICS, 2023, 39 (01) : 21 - 40
  • [49] A primer on partially observable Markov decision processes (POMDPs)
    Chades, Iadine
    Pascal, Luz V.
    Nicol, Sam
    Fletcher, Cameron S.
    Ferrer-Mestres, Jonathan
    METHODS IN ECOLOGY AND EVOLUTION, 2021, 12 (11): : 2058 - 2072
  • [50] Partially observable Markov decision processes with imprecise parameters
    Itoh, Hideaki
    Nakamura, Kiyohiko
    ARTIFICIAL INTELLIGENCE, 2007, 171 (8-9) : 453 - 490