Optimistic sequential multi-agent reinforcement learning with motivational communication

被引:0
|
作者
Huang, Anqi [1 ]
Wang, Yongli [1 ]
Zhou, Xiaoliang [1 ]
Zou, Haochen [1 ]
Dong, Xu [1 ]
Che, Xun [1 ]
机构
[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China
基金
中国国家自然科学基金;
关键词
Multi-agent reinforcement learning; Policy gradient; Motivational communication; Reinforcement learning; Multi-agent system;
D O I
10.1016/j.neunet.2024.106547
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Centralized Training with Decentralized Execution (CTDE) is a prevalent paradigm in the field of fully cooperative Multi-Agent Reinforcement Learning (MARL). Existing algorithms often encounter two major problems: independent strategies tend to underestimate the potential value of actions, leading to the convergence on sub-optimal Nash Equilibria (NE); some communication paradigms introduce added complexity to the learning process, complicating the focus on the essential elements of the messages. To address these challenges, we propose a novel method called O ptimistic S equential S oft Actor Critic with M otivational C ommunication (OSSMC). The key idea of OSSMC is to utilize a greedy-driven approach to explore the potential value of individual policies, named optimistic Q-values, which serve as an upper bound for the Q-value of the current policy. We then integrate a sequential update mechanism with optimistic Q-value for agents, aiming to ensure monotonic improvement in the joint policy optimization process. Moreover, we establish motivational communication modules for each agent to disseminate motivational messages to promote cooperative behaviors. Finally, we employ a value regularization strategy from the Soft Actor Critic (SAC) method to maximize entropy and improve exploration capabilities. The performance of OSSMC was rigorously evaluated against a series of challenging benchmark sets. Empirical results demonstrate that OSSMC not only surpasses current baseline algorithms but also exhibits a more rapid convergence rate.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] Learning to Share in Multi-Agent Reinforcement Learning
    Yi, Yuxuan
    Li, Ge
    Wang, Yaowei
    Lu, Zongqing
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [42] Multi-Agent Reinforcement Learning for Microgrids
    Dimeas, A. L.
    Hatziargyriou, N. D.
    IEEE POWER AND ENERGY SOCIETY GENERAL MEETING 2010, 2010,
  • [43] Multi-agent Exploration with Reinforcement Learning
    Sygkounas, Alkis
    Tsipianitis, Dimitris
    Nikolakopoulos, George
    Bechlioulis, Charalampos P.
    2022 30TH MEDITERRANEAN CONFERENCE ON CONTROL AND AUTOMATION (MED), 2022, : 630 - 635
  • [44] Hierarchical multi-agent reinforcement learning
    Ghavamzadeh, Mohammad
    Mahadevan, Sridhar
    Makar, Rajbala
    AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2006, 13 (02) : 197 - 229
  • [45] Partitioning in multi-agent reinforcement learning
    Sun, R
    Peterson, T
    FROM ANIMALS TO ANIMATS 6, 2000, : 325 - 332
  • [46] The Dynamics of Multi-Agent Reinforcement Learning
    Dickens, Luke
    Broda, Krysia
    Russo, Alessandra
    ECAI 2010 - 19TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2010, 215 : 367 - 372
  • [47] Multi-agent reinforcement learning: A survey
    Busoniu, Lucian
    Babuska, Robert
    De Schutter, Bart
    2006 9TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS AND VISION, VOLS 1- 5, 2006, : 1133 - +
  • [48] A multi-agent reinforcement learning framework for cross-domain sequential recommendation
    Liu, Huiting
    Wei, Junyi
    Zhu, Kaiwen
    Li, Peipei
    Zhao, Peng
    Wu, Xindong
    NEURAL NETWORKS, 2025, 185
  • [49] When Does Communication Learning Need Hierarchical Multi-Agent Deep Reinforcement Learning
    Ossenkopf, Marie
    Jorgensen, Mackenzie
    Geihs, Kurt
    CYBERNETICS AND SYSTEMS, 2019, 50 (08) : 672 - 692
  • [50] Learning to Schedule Joint Radar-Communication With Deep Multi-Agent Reinforcement Learning
    Lee, Joash
    Niyato, Dusit
    Guan, Yong Liang
    Kim, Dong In
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2022, 71 (01) : 406 - 422