Model-Free Learning for Two-Player Zero-Sum Partially Observable Markov Games with Perfect Recall

被引:0
|
作者
Kozuno, Tadashi [1 ]
Menard, Pierre [2 ]
Munos, Remi [3 ]
Valko, Michal [3 ]
机构
[1] Univ Alberta, Edmonton, AB, Canada
[2] Otto von Guericke Univ, Magdeburg, Germany
[3] DeepMind Paris, Paris, France
基金
加拿大自然科学与工程研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the problem of learning a Nash equilibrium (NE) in an imperfect information game (IIG) through self-play. Precisely, we focus on two-player, zero-sum, episodic, tabular IIG under the perfect-recall assumption where the only feedback is realizations of the game (bandit feedback). In particular, the dynamics of the IIG is not known-we can only access it by sampling or interacting with a game simulator. For this learning setting, we provide the Implicit Exploration Online Mirror Descent (IXOMD) algorithm. It is a model-free algorithm with a high-probability bound on the convergence rate to the NE of order 1/root T where T is the number of played games. Moreover, IXOMD is computationally efficient as it needs to perform the updates only along the sampled trajectory.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] On the Use of Non-Stationary Strategies for Solving Two-Player Zero-Sum Markov Games
    Perolat, Julien
    Piot, Bilal
    Scherrer, Bruno
    Pietquin, Olivier
    ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 51, 2016, 51 : 893 - 901
  • [22] GPI-Based design for partially unknown nonlinear two-player zero-sum games
    Yu, Lin
    Xiong, Junlin
    Xie, Min
    JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS, 2023, 360 (03): : 2068 - 2088
  • [23] Policy gradient algorithm and its convergence analysis for two-player zero-sum Markov games
    Wang Z.
    Li Y.
    Feng Y.
    Feng Y.
    Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2024, 58 (03): : 480 - 491
  • [24] Enhanced decision framework for two-player zero-sum Markov games with diverse opponent policies
    Zhu, Jin
    Wang, Xuan
    Dullerud, Geir E.
    APPLIED INTELLIGENCE, 2025, 55 (06)
  • [25] Optimality and Asymptotic Stability in Two-Player Zero-Sum Hybrid Games
    Leudo, Santiago J.
    Sanfelice, Ricardo G.
    HSCC 2022: PROCEEDINGS OF THE 25TH ACM INTERNATIONAL CONFERENCE ON HYBRID SYSTEMS: COMPUTATION AND CONTROL (PART OF CPS-IOT WEEK 2022), 2022,
  • [26] Pure strategy equilibria in symmetric two-player zero-sum games
    Peter Duersch
    Jörg Oechssler
    Burkhard C. Schipper
    International Journal of Game Theory, 2012, 41 : 553 - 564
  • [27] Generating Dominant Strategies for Continuous Two-Player Zero-Sum Games
    Vazquez-Chanlatte, Marcell J.
    Ghosh, Shromona
    Raman, Vasumathi
    Sangiovanni-Vincentelli, Alberto
    Seshia, Sanjit A.
    IFAC PAPERSONLINE, 2018, 51 (16): : 7 - 12
  • [28] Two-player zero-sum stochastic differential games with regime switching
    Lv, Siyu
    AUTOMATICA, 2020, 114
  • [29] Pure strategy equilibria in symmetric two-player zero-sum games
    Duersch, Peter
    Oechssler, Joerg
    Schipper, Burkhard C.
    INTERNATIONAL JOURNAL OF GAME THEORY, 2012, 41 (03) : 553 - 564
  • [30] TWO-PLAYER ZERO-SUM STOCHASTIC DIFFERENTIAL GAMES WITH RANDOM HORIZON
    Ferreira, M.
    Pinheiro, D.
    Pinheiro, S.
    ADVANCES IN APPLIED PROBABILITY, 2019, 51 (04) : 1209 - 1235