Model-Free Learning for Two-Player Zero-Sum Partially Observable Markov Games with Perfect Recall

被引：0

作者：

Kozuno, Tadashi ^{[1
]}

Menard, Pierre ^{[2
]}

Munos, Remi ^{[3
]}

Valko, Michal ^{[3
]}

机构：

[1] Univ Alberta, Edmonton, AB, Canada

[2] Otto von Guericke Univ, Magdeburg, Germany

[3] DeepMind Paris, Paris, France

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021) | 2021年 / 34卷

基金：

加拿大自然科学与工程研究理事会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We study the problem of learning a Nash equilibrium (NE) in an imperfect information game (IIG) through self-play. Precisely, we focus on two-player, zero-sum, episodic, tabular IIG under the perfect-recall assumption where the only feedback is realizations of the game (bandit feedback). In particular, the dynamics of the IIG is not known-we can only access it by sampling or interacting with a game simulator. For this learning setting, we provide the Implicit Exploration Online Mirror Descent (IXOMD) algorithm. It is a model-free algorithm with a high-probability bound on the convergence rate to the NE of order 1/root T where T is the number of played games. Moreover, IXOMD is computationally efficient as it needs to perform the updates only along the sampled trajectory.

引用

页数：12

共 50 条

[21] On the Use of Non-Stationary Strategies for Solving Two-Player Zero-Sum Markov Games
Perolat, Julien
Piot, Bilal
Scherrer, Bruno
Pietquin, Olivier
ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 51, 2016, 51 : 893 - 901
[22] GPI-Based design for partially unknown nonlinear two-player zero-sum games
Yu, Lin
Xiong, Junlin
Xie, Min
JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS, 2023, 360 (03): : 2068 - 2088
[23] Policy gradient algorithm and its convergence analysis for two-player zero-sum Markov games
Wang Z.
Li Y.
Feng Y.
Feng Y.
Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2024, 58 (03): : 480 - 491
[24] Enhanced decision framework for two-player zero-sum Markov games with diverse opponent policies
Zhu, Jin
Wang, Xuan
Dullerud, Geir E.
APPLIED INTELLIGENCE, 2025, 55 (06)
[25] Optimality and Asymptotic Stability in Two-Player Zero-Sum Hybrid Games
Leudo, Santiago J.
Sanfelice, Ricardo G.
HSCC 2022: PROCEEDINGS OF THE 25TH ACM INTERNATIONAL CONFERENCE ON HYBRID SYSTEMS: COMPUTATION AND CONTROL (PART OF CPS-IOT WEEK 2022), 2022,
[26] Pure strategy equilibria in symmetric two-player zero-sum games
Peter Duersch
Jörg Oechssler
Burkhard C. Schipper
International Journal of Game Theory, 2012, 41 : 553 - 564
[27] Generating Dominant Strategies for Continuous Two-Player Zero-Sum Games
Vazquez-Chanlatte, Marcell J.
Ghosh, Shromona
Raman, Vasumathi
Sangiovanni-Vincentelli, Alberto
Seshia, Sanjit A.
IFAC PAPERSONLINE, 2018, 51 (16): : 7 - 12
[28] Two-player zero-sum stochastic differential games with regime switching
Lv, Siyu
AUTOMATICA, 2020, 114
[29] Pure strategy equilibria in symmetric two-player zero-sum games
Duersch, Peter
Oechssler, Joerg
Schipper, Burkhard C.
INTERNATIONAL JOURNAL OF GAME THEORY, 2012, 41 (03) : 553 - 564
[30] TWO-PLAYER ZERO-SUM STOCHASTIC DIFFERENTIAL GAMES WITH RANDOM HORIZON
Ferreira, M.
Pinheiro, D.
Pinheiro, S.
ADVANCES IN APPLIED PROBABILITY, 2019, 51 (04) : 1209 - 1235

← 1 2 3 4 5 →