V-Learning-A Simple, Efficient, Decentralized Algorithm for Multiagent Reinforcement Learning

被引：0

作者：

Jin, Chi ^{[1
]}

Liu, Qinghua ^{[1
]}

Wang, Yuanhao ^{[2
]}

Yu, Tiancheng ^{[3
]}

机构：

[1] Princeton Univ, Dept Elect & Comp Engn, Princeton, NJ 08544 USA

[2] Princeton Univ, Dept Comp Sci, Princeton, NJ 08544 USA

[3] MIT, Dept Elect & Comp Engn, Cambridge, MA 02139 USA

来源：

MATHEMATICS OF OPERATIONS RESEARCH | 2024年 / 49卷 / 04期

关键词：

V-learning; Markov games; multiagent reinforcement learning; decentralized reinforcement learning; Nash equilibria; (coarse) correlated equilibria; GAMES; GO;

D O I：

10.1287/moor.2021.0317

中图分类号：

C93 [管理学]; O22 [运筹学];

学科分类号：

070105 ; 12 ; 1201 ; 1202 ; 120202 ;

摘要：

A major challenge of multiagent reinforcement learning (MARL) is the curse of multiagents, where the size of the joint action space scales exponentially with the number of agents. This remains to be a bottleneck for designing efficient MARL algorithms, even in a basic scenario with finitely many states and actions. This paper resolves this challenge for the model of episodic Markov games. We design a new class of fully decentralized algorithms-V-learning, which provably learns Nash equilibria (in the two-player zero-sum setting), correlated equilibria, and coarse correlated equilibria (in the multiplayer general-sum setting) in a number of samples that only scales with max(i is an element of[m])A(i), where A(i) is the number of actions for the ith player. This is in sharp contrast to the size of the joint action space, which is Pi(m)(i=1) A(i). V-learning (in its basic form) is a new class of single-agent reinforcement learning (RL) algorithms that convert any adversarial bandit algorithm with suitable regret guarantees into an RL algorithm. Similar to the classical Q-learning algorithm, it performs incremental updates to the value functions. Different from Q-learning, it only maintains the estimates of V-values instead of Q-values. This key difference allows V-learning to achieve the claimed guarantees in the MARL setting by simply letting all agents run V-learning independently.

引用

页码：2295 / 2322

页数：28

共 50 条

[1] Decentralized Multiagent Reinforcement Learning for Efficient Robotic Control by Coordination Graphs
Yu, Chao
Wang, Dongxu
Ren, Jiankang
Ge, Hongwei
Sun, Liang
PRICAI 2018: TRENDS IN ARTIFICIAL INTELLIGENCE, PT I, 2018, 11012 : 191 - 203
[2] Decentralized Reinforcement Learning Inspired by Multiagent Systems
Adjodah, Dhaval
PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS (AAMAS' 18), 2018, : 1729 - 1730
[3] An improved multiagent reinforcement learning algorithm
Meng, XP
Babuska, R
Busoniu, L
Chen, Y
Tan, WY
2005 IEEE/WIC/ACM International Conference on Intelligent Agent Technology, Proceedings, 2005, : 337 - 343
[4] An Efficient Transfer Learning Framework for Multiagent Reinforcement Learning
Yang, Tianpei
Wang, Weixun
Tang, Hongyao
Hao, Jianye
Meng, Zhaopeng
Mao, Hangyu
Li, Dong
Liu, Wulong
Zhang, Chengwei
Hu, Yujing
Chen, Yingfeng
Fan, Changjie
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[5] Heuristics for Multiagent Reinforcement Learning in Decentralized Decision Problems
Allen, Martin W.
Hahn, David
MacFarland, Douglas C.
2014 IEEE SYMPOSIUM ON ADAPTIVE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING (ADPRL), 2014, : 251 - 258
[6] Decentralized multiagent reinforcement learning algorithm using a cluster-synchronized laser network
Kotoku, Shun
Mihana, Takatomo
Rohm, Andre
Horisaki, Ryoichi
PHYSICAL REVIEW E, 2024, 110 (06)
[7] Adaptive Learning: A New Decentralized Reinforcement Learning Approach for Cooperative Multiagent Systems
Li, Meng-Lin
Chen, Shaofei
Chen, Jing
IEEE ACCESS, 2020, 8 : 99404 - 99421
[8] A Policy Gradient Algorithm for Learning to Learn in Multiagent Reinforcement Learning
Kim, Dong-Ki
Liu, Miao
Riemer, Matthew
Sun, Chuangchuang
Abdulhai, Marwa
Habibi, Golnaz
Lopez-Cot, Sebastian
Tesauro, Gerald
How, Jonathan P.
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[9] Automatic Decomposition of Reward Machines for Decentralized Multiagent Reinforcement Learning
Smith, Sophia
Neary, Cyrus
Topcu, Ufuk
2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 5423 - 5430
[10] Attentive Relational State Representation in Decentralized Multiagent Reinforcement Learning
Liu, Xiangyu
Tan, Ying
IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (01) : 252 - 264

← 1 2 3 4 5 →