V-Learning-A Simple, Efficient, Decentralized Algorithm for Multiagent Reinforcement Learning

被引:0
|
作者
Jin, Chi [1 ]
Liu, Qinghua [1 ]
Wang, Yuanhao [2 ]
Yu, Tiancheng [3 ]
机构
[1] Princeton Univ, Dept Elect & Comp Engn, Princeton, NJ 08544 USA
[2] Princeton Univ, Dept Comp Sci, Princeton, NJ 08544 USA
[3] MIT, Dept Elect & Comp Engn, Cambridge, MA 02139 USA
关键词
V-learning; Markov games; multiagent reinforcement learning; decentralized reinforcement learning; Nash equilibria; (coarse) correlated equilibria; GAMES; GO;
D O I
10.1287/moor.2021.0317
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
A major challenge of multiagent reinforcement learning (MARL) is the curse of multiagents, where the size of the joint action space scales exponentially with the number of agents. This remains to be a bottleneck for designing efficient MARL algorithms, even in a basic scenario with finitely many states and actions. This paper resolves this challenge for the model of episodic Markov games. We design a new class of fully decentralized algorithms-V-learning, which provably learns Nash equilibria (in the two-player zero-sum setting), correlated equilibria, and coarse correlated equilibria (in the multiplayer general-sum setting) in a number of samples that only scales with max(i is an element of[m])A(i), where A(i) is the number of actions for the ith player. This is in sharp contrast to the size of the joint action space, which is Pi(m)(i=1) A(i). V-learning (in its basic form) is a new class of single-agent reinforcement learning (RL) algorithms that convert any adversarial bandit algorithm with suitable regret guarantees into an RL algorithm. Similar to the classical Q-learning algorithm, it performs incremental updates to the value functions. Different from Q-learning, it only maintains the estimates of V-values instead of Q-values. This key difference allows V-learning to achieve the claimed guarantees in the MARL setting by simply letting all agents run V-learning independently.
引用
收藏
页码:2295 / 2322
页数:28
相关论文
共 50 条
  • [41] A Kind of Reinforcement Learning to Improve Genetic Algorithm for Multiagent Task Scheduling
    Li, Zhipeng
    Wei, Xiumei
    Jiang, Xuesong
    Pang, Yewen
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2021, 2021
  • [42] An Efficient Reinforcement Learning Algorithm for Continuous Actions
    Fu Bo
    Chen Xin
    He Yong
    Wu Min
    2013 25TH CHINESE CONTROL AND DECISION CONFERENCE (CCDC), 2013, : 80 - 85
  • [43] Multiagent Reinforcement Learning Algorithm for Distributed Dynamic Pricing of Managed Lanes
    Pandey, Venktesh
    Boyles, Stephen D.
    2018 21ST INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2018, : 2346 - 2351
  • [44] Interaction Models for Multiagent Reinforcement Learning
    Ribeiro, Richardson
    Borges, Andre P.
    Enembreck, Fabricio
    2008 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE FOR MODELLING CONTROL & AUTOMATION, VOLS 1 AND 2, 2008, : 464 - +
  • [45] Dynamic Pricing by Multiagent Reinforcement Learning
    Han, Wei
    Liu, Lingbo
    Zheng, Huaili
    PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON ELECTRONIC COMMERCE AND SECURITY, 2008, : 226 - 229
  • [46] Multiagent Reinforcement Learning in Escape Scenario
    Lee, Donghun
    Kim, Seonghyun
    Son, Young-Sung
    2018 INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC), 2018, : 1031 - 1033
  • [47] Coordination in multiagent reinforcement learning systems
    Kamal, MAS
    Murata, J
    KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 1, PROCEEDINGS, 2004, 3213 : 1197 - 1204
  • [48] A REINFORCEMENT LEARNING APPROACH FOR MULTIAGENT NAVIGATION
    Martinez-Gil, Francisco
    Barber, Fernando
    Lozano, Miguel
    Grimaldo, Francisco
    Fernandez, Fernando
    ICAART 2010: PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 1: ARTIFICIAL INTELLIGENCE, 2010, : 607 - 610
  • [49] Multiagent Adversarial Inverse Reinforcement Learning
    Wei, Ermo
    Wicke, Drew
    Luke, Sean
    AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 2265 - 2266
  • [50] Implicit imitation in multiagent reinforcement learning
    Price, B
    Boutilier, C
    MACHINE LEARNING, PROCEEDINGS, 1999, : 325 - 334