Policy Evaluation in Decentralized POMDPs With Belief Sharing

被引:0
|
作者
Kayaalp, Mert [1 ]
Ghadieh, Fatima [2 ]
Sayed, Ali H. [1 ]
机构
[1] Ecole Polytech Fed Lausanne EPFL, Adapt Syst Lab, CH-1015 Lausanne, Switzerland
[2] Amer Univ Beirut, Beirut 11072020, Lebanon
来源
关键词
Task analysis; Data models; State estimation; Robot sensing systems; Reinforcement learning; Hidden Markov models; Bayes methods; Belief state; distributed state estimation; multi-agent reinforcement learning; partially observable Markov decision process; value function learning; LEARNING-BEHAVIOR; CONSENSUS; AVERAGE;
D O I
10.1109/OJCSYS.2023.3277760
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Most works on multi-agent reinforcement learning focus on scenarios where the state of the environment is fully observable. In this work, we consider a cooperative policy evaluation task in which agents are not assumed to observe the environment state directly. Instead, agents can only have access to noisy observations and to belief vectors. It is well-known that finding global posterior distributions under multi-agent settings is generally NP-hard. As a remedy, we propose a fully decentralized belief forming strategy that relies on individual updates and on localized interactions over a communication network. In addition to the exchange of the beliefs, agents exploit the communication network by exchanging value function parameter estimates as well. We analytically show that the proposed strategy allows information to diffuse over the network, which in turn allows the agents' parameters to have a bounded difference with a centralized baseline. A multi-sensor target tracking application is considered in the simulations.
引用
收藏
页码:125 / 145
页数:21
相关论文
共 50 条
  • [1] Bounded Policy Iteration for Decentralized POMDPs
    Bernstein, Daniel S.
    Hansen, Eric A.
    Zilberstein, Shlomo
    19TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-05), 2005, : 1287 - 1292
  • [2] Privacy-Preserving Policy Iteration for Decentralized POMDPs
    Wu, Feng
    Zilberstein, Shlomo
    Chen, Xiaoping
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 4759 - 4766
  • [3] Information Gathering in Decentralized POMDPs by Policy Graph Improvement
    Lauri, Mikko
    Pajarinen, Joni
    Peters, Jan
    AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 1143 - 1151
  • [4] The Cross-Entropy Method for Policy Search in Decentralized POMDPs
    Oliehoek, Frans A.
    Kooij, Julian F. P.
    Vlassis, Nikos
    INFORMATICA-JOURNAL OF COMPUTING AND INFORMATICS, 2008, 32 (04): : 341 - 357
  • [5] Point-Based Bounded Policy Iteration for Decentralized POMDPs
    Kim, Youngwook
    Kim, Kee-Eung
    PRICAI 2010: TRENDS IN ARTIFICIAL INTELLIGENCE, 2010, 6230 : 614 - +
  • [6] Open Decentralized POMDPs
    Cohen, Jonathan
    Dibangoye, Jilles Steeve
    Mouaddib, Abdel-Illah
    2017 IEEE 29TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2017), 2017, : 977 - 984
  • [7] Profit Sharing that can Learn Deterministic Policy for POMDPs Environments
    Takamori, Yohei
    Osana, Yuko
    2011 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2011, : 490 - 495
  • [8] Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs
    Amato, Christopher
    Bernstein, Daniel S.
    Zilberstein, Shlomo
    AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2010, 21 (03) : 293 - 320
  • [9] Stochastic Nestedness and the Belief Sharing Information Pattern in Decentralized Control
    Yueksel, Serdar
    2009 AMERICAN CONTROL CONFERENCE, VOLS 1-9, 2009, : 4248 - 4253
  • [10] Approximating Reachable Belief Points in POMDPs
    Wray, Kyle Hollins
    Zilberstein, Shlomo
    2017 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2017, : 117 - 122