Policy Evaluation in Decentralized POMDPs With Belief Sharing

被引:0
|
作者
Kayaalp, Mert [1 ]
Ghadieh, Fatima [2 ]
Sayed, Ali H. [1 ]
机构
[1] Ecole Polytech Fed Lausanne EPFL, Adapt Syst Lab, CH-1015 Lausanne, Switzerland
[2] Amer Univ Beirut, Beirut 11072020, Lebanon
来源
关键词
Task analysis; Data models; State estimation; Robot sensing systems; Reinforcement learning; Hidden Markov models; Bayes methods; Belief state; distributed state estimation; multi-agent reinforcement learning; partially observable Markov decision process; value function learning; LEARNING-BEHAVIOR; CONSENSUS; AVERAGE;
D O I
10.1109/OJCSYS.2023.3277760
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Most works on multi-agent reinforcement learning focus on scenarios where the state of the environment is fully observable. In this work, we consider a cooperative policy evaluation task in which agents are not assumed to observe the environment state directly. Instead, agents can only have access to noisy observations and to belief vectors. It is well-known that finding global posterior distributions under multi-agent settings is generally NP-hard. As a remedy, we propose a fully decentralized belief forming strategy that relies on individual updates and on localized interactions over a communication network. In addition to the exchange of the beliefs, agents exploit the communication network by exchanging value function parameter estimates as well. We analytically show that the proposed strategy allows information to diffuse over the network, which in turn allows the agents' parameters to have a bounded difference with a centralized baseline. A multi-sensor target tracking application is considered in the simulations.
引用
收藏
页码:125 / 145
页数:21
相关论文
共 50 条
  • [21] Modeling and Planning with Macro-Actions in Decentralized POMDPs
    Amato, Christopher
    Konidaris, George
    Kaelbling, Leslie P.
    How, Jonathan P.
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2019, 64 : 817 - 859
  • [22] Decentralized Multi-Robot Cooperation with Auctioned POMDPs
    Capitan, Jesus
    Spaan, Matthijs
    Merino, Luis
    Ollero, Anibal
    TWENTY-FOURTH INTERNATIONAL CONFERENCE ON AUTOMATED PLANNING AND SCHEDULING, 2014, : 515 - 518
  • [23] Knowledge-Based Policies for Qualitative Decentralized POMDPs
    Saffidine, Abdallah
    Schwarzentruber, Francois
    Zanuttini, Bruno
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 6270 - 6277
  • [24] Decentralized multi-robot cooperation with auctioned POMDPs
    Capitan, Jesus
    Spaan, Matthijs T. J.
    Merino, Luis
    Ollero, Anibal
    INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2013, 32 (06): : 650 - 671
  • [25] Decentralized Multi-Robot Cooperation with Auctioned POMDPs
    Capitan, Jesus
    Spaan, Matthijs T. J.
    Merino, Luis
    Ollero, Anibal
    2012 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2012, : 3323 - 3328
  • [26] Profit Sharing that can Learn Deterministic Policy for POMDPs Environments by Kohonen Feature Map Associative Memory
    Koma, Daichi
    Osana, Yuko
    2013 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2013), 2013, : 2651 - 2658
  • [27] Fall Avoidance of Bipedal Walking Robot by Profit Sharing that can Learn Deterministic Policy for POMDPs Environments
    Suzuki, Toshihiro
    Osana, Yuko
    2014 SIXTH WORLD CONGRESS ON NATURE AND BIOLOGICALLY INSPIRED COMPUTING (NABIC), 2014, : 184 - 189
  • [28] Applying metric-trees to belief-point POMDPs
    Pineau, J
    Gordon, G
    Thrun, S
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 16, 2004, 16 : 759 - 766
  • [29] The Belief Roadmap: Efficient Planning in Linear POMDPs by Factoring the Covariance
    Prentice, Sam
    Roy, Nicholas
    ROBOTICS RESEARCH, 2010, 66 : 293 - 305
  • [30] Improved policy for resource allocation in decentralized dynamic spectrum sharing systems
    Cosovic, Ivan
    Yamada, Takefumi
    Maeda, Koji
    IEEE COMMUNICATIONS LETTERS, 2008, 12 (09) : 639 - 641