V-Learning-A Simple, Efficient, Decentralized Algorithm for Multiagent Reinforcement Learning

被引：0

作者：

Jin, Chi ^{[1
]}

Liu, Qinghua ^{[1
]}

Wang, Yuanhao ^{[2
]}

Yu, Tiancheng ^{[3
]}

机构：

[1] Princeton Univ, Dept Elect & Comp Engn, Princeton, NJ 08544 USA

[2] Princeton Univ, Dept Comp Sci, Princeton, NJ 08544 USA

[3] MIT, Dept Elect & Comp Engn, Cambridge, MA 02139 USA

来源：

MATHEMATICS OF OPERATIONS RESEARCH | 2024年 / 49卷 / 04期

关键词：

V-learning; Markov games; multiagent reinforcement learning; decentralized reinforcement learning; Nash equilibria; (coarse) correlated equilibria; GAMES; GO;

D O I：

10.1287/moor.2021.0317

中图分类号：

C93 [管理学]; O22 [运筹学];

学科分类号：

070105 ; 12 ; 1201 ; 1202 ; 120202 ;

摘要：

A major challenge of multiagent reinforcement learning (MARL) is the curse of multiagents, where the size of the joint action space scales exponentially with the number of agents. This remains to be a bottleneck for designing efficient MARL algorithms, even in a basic scenario with finitely many states and actions. This paper resolves this challenge for the model of episodic Markov games. We design a new class of fully decentralized algorithms-V-learning, which provably learns Nash equilibria (in the two-player zero-sum setting), correlated equilibria, and coarse correlated equilibria (in the multiplayer general-sum setting) in a number of samples that only scales with max(i is an element of[m])A(i), where A(i) is the number of actions for the ith player. This is in sharp contrast to the size of the joint action space, which is Pi(m)(i=1) A(i). V-learning (in its basic form) is a new class of single-agent reinforcement learning (RL) algorithms that convert any adversarial bandit algorithm with suitable regret guarantees into an RL algorithm. Similar to the classical Q-learning algorithm, it performs incremental updates to the value functions. Different from Q-learning, it only maintains the estimates of V-values instead of Q-values. This key difference allows V-learning to achieve the claimed guarantees in the MARL setting by simply letting all agents run V-learning independently.

引用

页码：2295 / 2322

页数：28

共 50 条

[41] A Kind of Reinforcement Learning to Improve Genetic Algorithm for Multiagent Task Scheduling
Li, Zhipeng
Wei, Xiumei
Jiang, Xuesong
Pang, Yewen
MATHEMATICAL PROBLEMS IN ENGINEERING, 2021, 2021
[42] An Efficient Reinforcement Learning Algorithm for Continuous Actions
Fu Bo
Chen Xin
He Yong
Wu Min
2013 25TH CHINESE CONTROL AND DECISION CONFERENCE (CCDC), 2013, : 80 - 85
[43] Multiagent Reinforcement Learning Algorithm for Distributed Dynamic Pricing of Managed Lanes
Pandey, Venktesh
Boyles, Stephen D.
2018 21ST INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2018, : 2346 - 2351
[44] Interaction Models for Multiagent Reinforcement Learning
Ribeiro, Richardson
Borges, Andre P.
Enembreck, Fabricio
2008 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE FOR MODELLING CONTROL & AUTOMATION, VOLS 1 AND 2, 2008, : 464 - +
[45] Dynamic Pricing by Multiagent Reinforcement Learning
Han, Wei
Liu, Lingbo
Zheng, Huaili
PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON ELECTRONIC COMMERCE AND SECURITY, 2008, : 226 - 229
[46] Multiagent Reinforcement Learning in Escape Scenario
Lee, Donghun
Kim, Seonghyun
Son, Young-Sung
2018 INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC), 2018, : 1031 - 1033
[47] Coordination in multiagent reinforcement learning systems
Kamal, MAS
Murata, J
KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 1, PROCEEDINGS, 2004, 3213 : 1197 - 1204
[48] A REINFORCEMENT LEARNING APPROACH FOR MULTIAGENT NAVIGATION
Martinez-Gil, Francisco
Barber, Fernando
Lozano, Miguel
Grimaldo, Francisco
Fernandez, Fernando
ICAART 2010: PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 1: ARTIFICIAL INTELLIGENCE, 2010, : 607 - 610
[49] Multiagent Adversarial Inverse Reinforcement Learning
Wei, Ermo
Wicke, Drew
Luke, Sean
AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 2265 - 2266
[50] Implicit imitation in multiagent reinforcement learning
Price, B
Boutilier, C
MACHINE LEARNING, PROCEEDINGS, 1999, : 325 - 334

← 1 2 3 4 5 →