The evolutionary dynamics of soft-max policy gradient in multi-agent settings

被引:0
|
作者
Bernasconi, Martino [1 ]
Cacciamani, Federico [1 ]
Fioravanti, Simone [2 ]
Gatti, Nicola [1 ]
Trovo, Francesco [1 ]
机构
[1] Politecn Milan, Milan, Italy
[2] Gran Sasso Sci Inst, Laquila, Italy
关键词
Game theory; Evolutionary game theory; Reinforcement learning; Multiagent learning; REINFORCEMENT;
D O I
10.1016/j.tcs.2024.115011
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Policy gradient is one of the most famous algorithms in reinforcement learning. This paper studies the mean dynamics of the soft-max policy gradient algorithm and its properties in multi- agent settings by resorting to evolutionary game theory and dynamical system tools. Unlike most multi-agent reinforcement learning algorithms, whose mean dynamics are a slight variant of the replicator dynamics not affecting the properties of the original dynamics, the soft-max policy gradient dynamics presents a structure significantly different from that of the replicator. In particular, we show that the soft-max policy gradient dynamics in a given game are equivalent to the replicator dynamics in an auxiliary game obtained by a non-convex transformation of the payoffs of the original game. Such a structure gives the dynamics several non-standard properties. The first property we study concerns the convergence to the best response. In particular, while the continuous-time mean dynamics always converge to the best response, the crucial question concerns the convergence speed. Precisely, we show that the space of initializations can be split into two complementary sets such that the trajectories initialized from points of the first set (said good initialization region) directly move to the best response. In contrast, those initialized from points of the second set (said bad initialization region) move first to a series of sub-optimal strategies and then to the best response. Interestingly, in multi-agent adversarial machine learning environments, we show that an adversary can exploit this property to make any current strategy of the learning agent using the soft-max policy gradient fall inside a bad initialization region, thus slowing its learning process and exploiting that policy. When the soft-max policy gradient dynamics is studied in multi-population games, modeling the learning dynamics in self-play, we show that the dynamics preserve the volume of the set of initial points. This property proves that the dynamics cannot converge when the only equilibrium of the game is fully mixed, as the volume of the set of initial points would need to shrink. We also give empirical evidence that the volume expands over time, suggesting that the dynamics in games with fully-mixed equilibrium is chaotic.
引用
收藏
页数:23
相关论文
共 50 条
  • [1] Local Analysis of Entropy-Regularized Stochastic Soft-Max Policy Gradient Methods
    Ding, Yuhao
    Zhang, Junzi
    Lavaei, Javad
    2023 EUROPEAN CONTROL CONFERENCE, ECC, 2023,
  • [2] Cooperative Multi-agent Policy Gradient
    Bono, Guillaume
    Dibangoye, Jilles Steeve
    Matignon, Laetitia
    Pereyron, Florian
    Simonin, Olivier
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2018, PT I, 2019, 11051 : 459 - 476
  • [3] Evolutionary Dynamics of Multi-agent Formation
    Qin, Jin
    Ban, Xiaojuan
    Li, Xin
    CCDC 2009: 21ST CHINESE CONTROL AND DECISION CONFERENCE, VOLS 1-6, PROCEEDINGS, 2009, : 3557 - 3561
  • [4] MAPPG: Multi-agent Phasic Policy Gradient
    Zhang, Qi
    Zhang, Xuetao
    Liu, Yisha
    Zhang, Xuebo
    Zhuang, Yan
    2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 2366 - 2371
  • [5] Evolutionary Dynamics of Multi-Agent Learning: A Survey
    Bloembergen, Daan
    Tuyls, Karl
    Hennes, Daniel
    Kaisers, Michael
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2015, 53 : 659 - 697
  • [6] TAPE: Leveraging Agent Topology for Cooperative Multi-Agent Policy Gradient
    Lou, Xingzhou
    Zhang, Junge
    Norman, Timothy J.
    Huang, Kaiqi
    Du, Yali
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 17496 - 17504
  • [7] Multi-category classification by soft-max combination of binary classifiers
    Duan, KB
    Keerthi, SS
    Chu, W
    Shevade, SK
    Poo, AN
    MULTIPLE CLASSIFIER SYSTEMS, PROCEEDING, 2003, 2709 : 125 - 134
  • [8] Blameworthiness in Multi-Agent Settings
    Friedenberg, Meir
    Halpern, Joseph Y.
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 525 - 532
  • [9] Twin Delayed Multi-Agent Deep Deterministic Policy Gradient
    Zhan, Mengying
    Chen, Jinchao
    Du, Chenglie
    Duan, Yuxin
    PROCEEDINGS OF THE 2021 IEEE INTERNATIONAL CONFERENCE ON PROGRESS IN INFORMATICS AND COMPUTING (PIC), 2021, : 48 - 52
  • [10] A learning automata approach to multi-agent policy gradient learning
    Peeters, Maarten
    Kononen, Ville
    Verbeeck, Katja
    Nowe, Ann
    KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 2, PROCEEDINGS, 2008, 5178 : 379 - +