The evolutionary dynamics of soft-max policy gradient in multi-agent settings

被引:0
|
作者
Bernasconi, Martino [1 ]
Cacciamani, Federico [1 ]
Fioravanti, Simone [2 ]
Gatti, Nicola [1 ]
Trovo, Francesco [1 ]
机构
[1] Politecn Milan, Milan, Italy
[2] Gran Sasso Sci Inst, Laquila, Italy
关键词
Game theory; Evolutionary game theory; Reinforcement learning; Multiagent learning; REINFORCEMENT;
D O I
10.1016/j.tcs.2024.115011
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Policy gradient is one of the most famous algorithms in reinforcement learning. This paper studies the mean dynamics of the soft-max policy gradient algorithm and its properties in multi- agent settings by resorting to evolutionary game theory and dynamical system tools. Unlike most multi-agent reinforcement learning algorithms, whose mean dynamics are a slight variant of the replicator dynamics not affecting the properties of the original dynamics, the soft-max policy gradient dynamics presents a structure significantly different from that of the replicator. In particular, we show that the soft-max policy gradient dynamics in a given game are equivalent to the replicator dynamics in an auxiliary game obtained by a non-convex transformation of the payoffs of the original game. Such a structure gives the dynamics several non-standard properties. The first property we study concerns the convergence to the best response. In particular, while the continuous-time mean dynamics always converge to the best response, the crucial question concerns the convergence speed. Precisely, we show that the space of initializations can be split into two complementary sets such that the trajectories initialized from points of the first set (said good initialization region) directly move to the best response. In contrast, those initialized from points of the second set (said bad initialization region) move first to a series of sub-optimal strategies and then to the best response. Interestingly, in multi-agent adversarial machine learning environments, we show that an adversary can exploit this property to make any current strategy of the learning agent using the soft-max policy gradient fall inside a bad initialization region, thus slowing its learning process and exploiting that policy. When the soft-max policy gradient dynamics is studied in multi-population games, modeling the learning dynamics in self-play, we show that the dynamics preserve the volume of the set of initial points. This property proves that the dynamics cannot converge when the only equilibrium of the game is fully mixed, as the volume of the set of initial points would need to shrink. We also give empirical evidence that the volume expands over time, suggesting that the dynamics in games with fully-mixed equilibrium is chaotic.
引用
收藏
页数:23
相关论文
共 50 条
  • [21] Intrinsic Motivation for Deep Deterministic Policy Gradient in Multi-Agent Environments
    Cao, Xiaoge
    Lu, Tao
    Cai, Yinghao
    2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 1628 - 1633
  • [22] QSOD: Hybrid Policy Gradient for Deep Multi-agent Reinforcement Learning
    Rehman, Hafiz Muhammad Raza Ur
    On, Byung-Won
    Ningombam, Devarani Devi
    Yi, Sungwon
    Choi, Gyu Sang
    IEEE ACCESS, 2021, 9 : 129728 - 129741
  • [23] Evolutionary Game Dynamics Based on Local Intervention in Multi-Agent Systems
    Zhu, Yuying
    Zhang, Jianlei
    Han, Jianda
    Chen, Zengqiang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2021, 68 (04) : 1293 - 1297
  • [24] An Evolutionary Framework for Multi-Agent Organizations
    Li, Boyang
    Yu, Han
    Shen, Zhiqi
    Cui, Lizhen
    Lesser, Victor R.
    2015 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT), VOL 2, 2015, : 35 - 38
  • [25] Evolutionary Cooperation in a Multi-agent Society
    de Vries, Marjolein
    Spronck, Pieter
    ADVANCES IN SOCIAL SIMULATION 2015, 2017, 528 : 67 - 79
  • [26] Multi-agent learning by evolutionary subsumption
    Liu, HW
    Iba, H
    CEC: 2003 CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-4, PROCEEDINGS, 2003, : 1115 - 1122
  • [27] NICHING IN EVOLUTIONARY MULTI-AGENT SYSTEMS
    Krzywicki, Daniel
    COMPUTER SCIENCE-AGH, 2013, 14 (01): : 77 - 95
  • [28] Decentralized Policy Gradient Descent Ascent for Safe Multi-Agent Reinforcement Learning
    Lu, Songtao
    Zhang, Kaiqing
    Chen, Tianyi
    Basar, Tamer
    Horesh, Lior
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 8767 - 8775
  • [29] Multi-Agent Deep Deterministic Policy Gradient Method Based on Double Critics
    Ding S.
    Du W.
    Guo L.
    Zhang J.
    Xu X.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2023, 60 (10): : 2394 - 2404
  • [30] Symmetric (Optimistic) Natural Policy Gradient for Multi-agent Learning with Parameter Convergence
    Pattathil, Sarath
    Zhang, Kaiqing
    Ozdaglar, Asuman
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206