Student of Games: A unified learning algorithm for both perfect and imperfect information games

被引:2
|
作者
Schmid, Martin [1 ,2 ]
Moravcik, Matej [1 ,2 ]
Burch, Neil [2 ,3 ]
Kadlec, Rudolf [1 ,2 ]
Davidson, Josh [2 ,3 ]
Waugh, Kevin [2 ,3 ]
Bard, Nolan [2 ,3 ]
Timbers, Finbarr [4 ,5 ]
Lanctot, Marc [2 ,6 ]
Holland, G. Zacharias [2 ,3 ]
Davoodi, Elnaz [2 ,6 ]
Christianson, Alden [2 ,7 ]
Bowling, Michael [2 ,4 ,7 ]
机构
[1] EquiLibre Technol, Prague, Czech Republic
[2] Google Deepmind, London, England
[3] Sony AI, New York, NY USA
[4] Amii, Edmonton, AB, Canada
[5] Midjourney, South San Francisco, CA USA
[6] Google Deepmind, Montreal, PQ, Canada
[7] Univ Alberta, Edmonton, AB, Canada
关键词
CARLO TREE-SEARCH; GO; LEVEL;
D O I
10.1126/sciadv.adg3256
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Games have a long history as benchmarks for progress in artificial intelligence. Approaches using search and learning produced strong performance across many perfect information games, and approaches using game-theoretic reasoning and learning demonstrated strong performance for specific imperfect information poker variants. We introduce Student of Games, a general-purpose algorithm that unifies previous approaches, combining guided search, self-play learning, and game-theoretic reasoning. Student of Games achieves strong empirical performance in large perfect and imperfect information games-an important step toward truly general algorithms for arbitrary environments. We prove that Student of Games is sound, converging to perfect play as available computation and approximation capacity increases. Student of Games reaches strong performance in chess and Go, beats the strongest openly available agent in heads-up no-limit Texas hold'em poker, and defeats the state-of-the-art agent in Scotland Yard, an imperfect information game that illustrates the value of guided search, learning, and game-theoretic reasoning.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Perfect recall and pruning in games with imperfect information
    Blair, JRS
    Mutchler, D
    vanLent, M
    COMPUTATIONAL INTELLIGENCE, 1996, 12 (01) : 131 - 154
  • [2] Perfect recall and pruning in games with imperfect information
    United States Military Acad, West Point, United States
    Comput Intell, 4 (131-154):
  • [3] Ensemble strategy learning for imperfect information games
    Yuan, Weilin
    Chen, Shaofei
    Li, Peng
    Chen, Jing
    NEUROCOMPUTING, 2023, 546
  • [4] Dynamic games with imperfect information
    Z Angew Math Mech ZAMM, Suppl 3 (517):
  • [5] Dynamic games with imperfect information
    Mokhonko, EZ
    ZEITSCHRIFT FUR ANGEWANDTE MATHEMATIK UND MECHANIK, 1996, 76 : 517 - 518
  • [6] INFINITE GAMES WITH IMPERFECT INFORMATION
    ORKIN, M
    TRANSACTIONS OF THE AMERICAN MATHEMATICAL SOCIETY, 1972, 171 (SEP) : 501 - 507
  • [7] Generalized reinforcement learning in perfect-information games
    Maxwell Pak
    Bing Xu
    International Journal of Game Theory, 2016, 45 : 985 - 1011
  • [8] Generalized reinforcement learning in perfect-information games
    Pak, Maxwell
    Xu, Bing
    INTERNATIONAL JOURNAL OF GAME THEORY, 2016, 45 (04) : 985 - 1011
  • [9] Parameterized games of perfect information
    János Flesch
    Arkadi Predtetchinski
    Annals of Operations Research, 2020, 287 : 683 - 699
  • [10] Perfect information and potential games
    Kukushkin, NS
    GAMES AND ECONOMIC BEHAVIOR, 2002, 38 (02) : 306 - 317