V-Learning-A Simple, Efficient, Decentralized Algorithm for Multiagent Reinforcement Learning

被引:0
|
作者
Jin, Chi [1 ]
Liu, Qinghua [1 ]
Wang, Yuanhao [2 ]
Yu, Tiancheng [3 ]
机构
[1] Princeton Univ, Dept Elect & Comp Engn, Princeton, NJ 08544 USA
[2] Princeton Univ, Dept Comp Sci, Princeton, NJ 08544 USA
[3] MIT, Dept Elect & Comp Engn, Cambridge, MA 02139 USA
关键词
V-learning; Markov games; multiagent reinforcement learning; decentralized reinforcement learning; Nash equilibria; (coarse) correlated equilibria; GAMES; GO;
D O I
10.1287/moor.2021.0317
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
A major challenge of multiagent reinforcement learning (MARL) is the curse of multiagents, where the size of the joint action space scales exponentially with the number of agents. This remains to be a bottleneck for designing efficient MARL algorithms, even in a basic scenario with finitely many states and actions. This paper resolves this challenge for the model of episodic Markov games. We design a new class of fully decentralized algorithms-V-learning, which provably learns Nash equilibria (in the two-player zero-sum setting), correlated equilibria, and coarse correlated equilibria (in the multiplayer general-sum setting) in a number of samples that only scales with max(i is an element of[m])A(i), where A(i) is the number of actions for the ith player. This is in sharp contrast to the size of the joint action space, which is Pi(m)(i=1) A(i). V-learning (in its basic form) is a new class of single-agent reinforcement learning (RL) algorithms that convert any adversarial bandit algorithm with suitable regret guarantees into an RL algorithm. Similar to the classical Q-learning algorithm, it performs incremental updates to the value functions. Different from Q-learning, it only maintains the estimates of V-values instead of Q-values. This key difference allows V-learning to achieve the claimed guarantees in the MARL setting by simply letting all agents run V-learning independently.
引用
收藏
页码:2295 / 2322
页数:28
相关论文
共 50 条
  • [1] Decentralized Multiagent Reinforcement Learning for Efficient Robotic Control by Coordination Graphs
    Yu, Chao
    Wang, Dongxu
    Ren, Jiankang
    Ge, Hongwei
    Sun, Liang
    PRICAI 2018: TRENDS IN ARTIFICIAL INTELLIGENCE, PT I, 2018, 11012 : 191 - 203
  • [2] Decentralized Reinforcement Learning Inspired by Multiagent Systems
    Adjodah, Dhaval
    PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS (AAMAS' 18), 2018, : 1729 - 1730
  • [3] An improved multiagent reinforcement learning algorithm
    Meng, XP
    Babuska, R
    Busoniu, L
    Chen, Y
    Tan, WY
    2005 IEEE/WIC/ACM International Conference on Intelligent Agent Technology, Proceedings, 2005, : 337 - 343
  • [4] An Efficient Transfer Learning Framework for Multiagent Reinforcement Learning
    Yang, Tianpei
    Wang, Weixun
    Tang, Hongyao
    Hao, Jianye
    Meng, Zhaopeng
    Mao, Hangyu
    Li, Dong
    Liu, Wulong
    Zhang, Chengwei
    Hu, Yujing
    Chen, Yingfeng
    Fan, Changjie
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [5] Heuristics for Multiagent Reinforcement Learning in Decentralized Decision Problems
    Allen, Martin W.
    Hahn, David
    MacFarland, Douglas C.
    2014 IEEE SYMPOSIUM ON ADAPTIVE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING (ADPRL), 2014, : 251 - 258
  • [6] Decentralized multiagent reinforcement learning algorithm using a cluster-synchronized laser network
    Kotoku, Shun
    Mihana, Takatomo
    Rohm, Andre
    Horisaki, Ryoichi
    PHYSICAL REVIEW E, 2024, 110 (06)
  • [7] Adaptive Learning: A New Decentralized Reinforcement Learning Approach for Cooperative Multiagent Systems
    Li, Meng-Lin
    Chen, Shaofei
    Chen, Jing
    IEEE ACCESS, 2020, 8 : 99404 - 99421
  • [8] A Policy Gradient Algorithm for Learning to Learn in Multiagent Reinforcement Learning
    Kim, Dong-Ki
    Liu, Miao
    Riemer, Matthew
    Sun, Chuangchuang
    Abdulhai, Marwa
    Habibi, Golnaz
    Lopez-Cot, Sebastian
    Tesauro, Gerald
    How, Jonathan P.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [9] Automatic Decomposition of Reward Machines for Decentralized Multiagent Reinforcement Learning
    Smith, Sophia
    Neary, Cyrus
    Topcu, Ufuk
    2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 5423 - 5430
  • [10] Attentive Relational State Representation in Decentralized Multiagent Reinforcement Learning
    Liu, Xiangyu
    Tan, Ying
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (01) : 252 - 264