Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs

被引:69
|
作者
Amato, Christopher [1 ]
Bernstein, Daniel S. [1 ]
Zilberstein, Shlomo [1 ]
机构
[1] Univ Massachusetts, Dept Comp Sci, Amherst, MA 01003 USA
基金
美国国家科学基金会;
关键词
Decision theory; Multiagent systems; Planning under uncertainty; POMDPs; DEC-POMDPs;
D O I
10.1007/s10458-009-9103-z
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
POMDPs and their decentralized multiagent counterparts, DEC-POMDPs, offer a rich framework for sequential decision making under uncertainty. Their high computational complexity, however, presents an important research challenge. One way to address the intractable memory requirements of current algorithms is based on representing agent policies as finite-state controllers. Using this representation, we propose a new approach that formulates the problem as a nonlinear program, which defines an optimal policy of a desired size for each agent. This new formulation allows a wide range of powerful nonlinear programming algorithms to be used to solve POMDPs and DEC-POMDPs. Although solving the NLP optimally is often intractable, the results we obtain using an off-the-shelf optimization method are competitive with state-of-the-art POMDP algorithms and outperform state-of-the-art DEC-POMDP algorithms. Our approach is easy to implement and it opens up promising research directions for solving POMDPs and DEC-POMDPs using nonlinear programming methods.
引用
收藏
页码:293 / 320
页数:28
相关论文
共 50 条
  • [31] Safe Policy Improvement for POMDPs via Finite-State Controllers
    Simao, Thiago D.
    Suilen, Marnix
    Jansen, Nils
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 12, 2023, : 15109 - 15117
  • [32] Decentralized Learning of Finite-Memory Policies in Dec-POMDPs
    Mao, Weichao
    Zhang, Kaiqing
    Yang, Zhuoran
    Ba, Tamer Sar
    IFAC PAPERSONLINE, 2023, 56 (02): : 2601 - 2607
  • [33] Compositional Construction of Safety Controllers for Networks of Continuous-Space POMDPs
    Jahanshahi, Niloofar
    Lavaei, Abolfazl
    Zamani, Majid
    IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2023, 10 (01): : 87 - 99
  • [34] Reinforcement learning for POMDPs based on action values and stochastic optimization
    Perkins, TJ
    EIGHTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-02)/FOURTEENTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE (IAAI-02), PROCEEDINGS, 2002, : 199 - 204
  • [35] A Role-based POMDPs Approach for Decentralized Implicit Cooperation of Multiple Agents
    Zhang, Hao
    Chen, Jie
    Fang, Hao
    Dou, Lihua
    2017 13TH IEEE INTERNATIONAL CONFERENCE ON CONTROL & AUTOMATION (ICCA), 2017, : 496 - 501
  • [36] Convex Stochastic Dominance in Bayesian Localization, Filtering, and Controlled Sensing POMDPs
    Krishnamurthy, Vikram
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2020, 66 (05) : 3187 - 3201
  • [37] Bayesian-Game-Based Fuzzy Reinforcement Learning Control for Decentralized POMDPs
    Sharma, Rajneesh
    Spaan, Matthijs T. J.
    IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, 2012, 4 (04) : 309 - 328
  • [38] Decentralized Coordination of Multi-Agent Systems Based on POMDPs and Consensus for Active Perception
    Peti, Marijana
    Petric, Frano
    Bogdan, Stjepan
    IEEE ACCESS, 2023, 11 : 52480 - 52491
  • [40] FIXED-SIZE RECTANGULAR CONFIDENCE REGIONS
    JONES, ER
    COMMUNICATIONS IN STATISTICS PART A-THEORY AND METHODS, 1977, 6 (03): : 251 - 264