Student of Games: A unified learning algorithm for both perfect and imperfect information games

被引:2
|
作者
Schmid, Martin [1 ,2 ]
Moravcik, Matej [1 ,2 ]
Burch, Neil [2 ,3 ]
Kadlec, Rudolf [1 ,2 ]
Davidson, Josh [2 ,3 ]
Waugh, Kevin [2 ,3 ]
Bard, Nolan [2 ,3 ]
Timbers, Finbarr [4 ,5 ]
Lanctot, Marc [2 ,6 ]
Holland, G. Zacharias [2 ,3 ]
Davoodi, Elnaz [2 ,6 ]
Christianson, Alden [2 ,7 ]
Bowling, Michael [2 ,4 ,7 ]
机构
[1] EquiLibre Technol, Prague, Czech Republic
[2] Google Deepmind, London, England
[3] Sony AI, New York, NY USA
[4] Amii, Edmonton, AB, Canada
[5] Midjourney, South San Francisco, CA USA
[6] Google Deepmind, Montreal, PQ, Canada
[7] Univ Alberta, Edmonton, AB, Canada
关键词
CARLO TREE-SEARCH; GO; LEVEL;
D O I
10.1126/sciadv.adg3256
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Games have a long history as benchmarks for progress in artificial intelligence. Approaches using search and learning produced strong performance across many perfect information games, and approaches using game-theoretic reasoning and learning demonstrated strong performance for specific imperfect information poker variants. We introduce Student of Games, a general-purpose algorithm that unifies previous approaches, combining guided search, self-play learning, and game-theoretic reasoning. Student of Games achieves strong empirical performance in large perfect and imperfect information games-an important step toward truly general algorithms for arbitrary environments. We prove that Student of Games is sound, converging to perfect play as available computation and approximation capacity increases. Student of Games reaches strong performance in chess and Go, beats the strongest openly available agent in heads-up no-limit Texas hold'em poker, and defeats the state-of-the-art agent in Scotland Yard, an imperfect information game that illustrates the value of guided search, learning, and game-theoretic reasoning.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] Knowing and supposing in games of perfect information
    Arló-Costa H.
    Bicchieri C.
    Studia Logica, 2007, 86 (3) : 353 - 373
  • [32] Backward Induction in Games of Perfect Information
    Aumann, Robert J.
    2012 27TH ANNUAL ACM/IEEE SYMPOSIUM ON LOGIC IN COMPUTER SCIENCE (LICS), 2012, : 1 - 1
  • [33] Hypothetical knowledge and games with perfect information
    Samet, D
    GAMES AND ECONOMIC BEHAVIOR, 1996, 17 (02) : 230 - 251
  • [34] Multistage network games with perfect information
    L. A. Petrosyan
    A. A. Sedakov
    Automation and Remote Control, 2014, 75 : 1532 - 1540
  • [35] Perfect equilibria in games of incomplete information
    Oriol Carbonell-Nicolau
    Economic Theory, 2021, 71 : 1591 - 1648
  • [36] The simple geometry of perfect information games
    Stefano Demichelis
    Klaus Ritzberger
    Jeroen M. Swinkels
    International Journal of Game Theory, 2004, 32 : 315 - 338
  • [37] Multistage network games with perfect information
    Petrosyan, L. A.
    Sedakov, A. A.
    AUTOMATION AND REMOTE CONTROL, 2014, 75 (08) : 1532 - 1540
  • [38] OPTIMISTIC STABILITY IN GAMES OF PERFECT INFORMATION
    SHITOVITZ, B
    MATHEMATICAL SOCIAL SCIENCES, 1994, 28 (03) : 199 - 214
  • [39] A Pumping Algorithm for Ergodic Stochastic Mean Payoff Games with Perfect Information
    Boros, Endre
    Elbassioni, Khaled
    Gurvich, Vladimir
    Makino, Kazuhisa
    INTEGER PROGRAMMING AND COMBINATORIAL OPTIMIZATION, PROCEEDINGS, 2010, 6080 : 341 - +
  • [40] Evolutionary reinforcement learning with action sequence search for imperfect information games
    Wu, Xiaoqiang
    Zhu, Qingling
    Chen, Wei-Neng
    Lin, Qiuzhen
    Li, Jianqiang
    Coello, Carlos A. Coello
    INFORMATION SCIENCES, 2024, 676