Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs

被引:69
|
作者
Amato, Christopher [1 ]
Bernstein, Daniel S. [1 ]
Zilberstein, Shlomo [1 ]
机构
[1] Univ Massachusetts, Dept Comp Sci, Amherst, MA 01003 USA
基金
美国国家科学基金会;
关键词
Decision theory; Multiagent systems; Planning under uncertainty; POMDPs; DEC-POMDPs;
D O I
10.1007/s10458-009-9103-z
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
POMDPs and their decentralized multiagent counterparts, DEC-POMDPs, offer a rich framework for sequential decision making under uncertainty. Their high computational complexity, however, presents an important research challenge. One way to address the intractable memory requirements of current algorithms is based on representing agent policies as finite-state controllers. Using this representation, we propose a new approach that formulates the problem as a nonlinear program, which defines an optimal policy of a desired size for each agent. This new formulation allows a wide range of powerful nonlinear programming algorithms to be used to solve POMDPs and DEC-POMDPs. Although solving the NLP optimally is often intractable, the results we obtain using an off-the-shelf optimization method are competitive with state-of-the-art POMDP algorithms and outperform state-of-the-art DEC-POMDP algorithms. Our approach is easy to implement and it opens up promising research directions for solving POMDPs and DEC-POMDPs using nonlinear programming methods.
引用
收藏
页码:293 / 320
页数:28
相关论文
共 50 条
  • [21] A stochastic point-based algorithm for POMDPs
    Laviolette, Francois
    Tobin, Ludovic
    ADVANCES IN ARTIFICIAL INTELLIGENCE, 2008, 5032 : 332 - 343
  • [22] Decentralized Multi-Robot Cooperation with Auctioned POMDPs
    Capitan, Jesus
    Spaan, Matthijs T. J.
    Merino, Luis
    Ollero, Anibal
    2012 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2012, : 3323 - 3328
  • [23] Information Gathering in Decentralized POMDPs by Policy Graph Improvement
    Lauri, Mikko
    Pajarinen, Joni
    Peters, Jan
    AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 1143 - 1151
  • [24] Finite-state Controllers of POMDPs via Parameter Synthesis
    Junges, Sebastian
    Jansen, Nils
    Wimmer, Ralf
    Quatmann, Tim
    Winterer, Leonore
    Katoen, Joost-Pieter
    Becker, Bernd
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2018, : 519 - 529
  • [25] Bayesian Learning of Other Agents' Finite Controllers for Interactive POMDPs
    Panella, Alessandro
    Gmytrasiewicz, Piotr
    THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 2530 - 2536
  • [26] Optimal and approximate Q-value functions for decentralized POMDPs
    Oliehoek, Frans A.
    Spaan, Matthijs T. J.
    Vlassis, Nikos
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2008, 32 : 289 - 353
  • [27] The Cross-Entropy Method for Policy Search in Decentralized POMDPs
    Oliehoek, Frans A.
    Kooij, Julian F. P.
    Vlassis, Nikos
    INFORMATICA-JOURNAL OF COMPUTING AND INFORMATICS, 2008, 32 (04): : 341 - 357
  • [28] Incremental Clustering and Expansion for Faster Optimal Planning in Decentralized POMDPs
    Oliehoek, Frans A.
    Spaan, Matthijs T. J.
    Amato, Christopher
    Whiteson, Shimon
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2013, 46 : 449 - 509
  • [29] Point-Based Bounded Policy Iteration for Decentralized POMDPs
    Kim, Youngwook
    Kim, Kee-Eung
    PRICAI 2010: TRENDS IN ARTIFICIAL INTELLIGENCE, 2010, 6230 : 614 - +
  • [30] A Cultural Algorithm for POMDPs from Stochastic Inventory Control
    Prestwich, S. D.
    Tarim, S. A.
    Rossi, R.
    Hnich, B.
    HYBRID METAHEURISTICS, PROCEEDINGS, 2008, 5296 : 16 - +