Student of Games: A unified learning algorithm for both perfect and imperfect information games

被引:2
|
作者
Schmid, Martin [1 ,2 ]
Moravcik, Matej [1 ,2 ]
Burch, Neil [2 ,3 ]
Kadlec, Rudolf [1 ,2 ]
Davidson, Josh [2 ,3 ]
Waugh, Kevin [2 ,3 ]
Bard, Nolan [2 ,3 ]
Timbers, Finbarr [4 ,5 ]
Lanctot, Marc [2 ,6 ]
Holland, G. Zacharias [2 ,3 ]
Davoodi, Elnaz [2 ,6 ]
Christianson, Alden [2 ,7 ]
Bowling, Michael [2 ,4 ,7 ]
机构
[1] EquiLibre Technol, Prague, Czech Republic
[2] Google Deepmind, London, England
[3] Sony AI, New York, NY USA
[4] Amii, Edmonton, AB, Canada
[5] Midjourney, South San Francisco, CA USA
[6] Google Deepmind, Montreal, PQ, Canada
[7] Univ Alberta, Edmonton, AB, Canada
关键词
CARLO TREE-SEARCH; GO; LEVEL;
D O I
10.1126/sciadv.adg3256
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Games have a long history as benchmarks for progress in artificial intelligence. Approaches using search and learning produced strong performance across many perfect information games, and approaches using game-theoretic reasoning and learning demonstrated strong performance for specific imperfect information poker variants. We introduce Student of Games, a general-purpose algorithm that unifies previous approaches, combining guided search, self-play learning, and game-theoretic reasoning. Student of Games achieves strong empirical performance in large perfect and imperfect information games-an important step toward truly general algorithms for arbitrary environments. We prove that Student of Games is sound, converging to perfect play as available computation and approximation capacity increases. Student of Games reaches strong performance in chess and Go, beats the strongest openly available agent in heads-up no-limit Texas hold'em poker, and defeats the state-of-the-art agent in Scotland Yard, an imperfect information game that illustrates the value of guided search, learning, and game-theoretic reasoning.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] Perfect equilibria in games of incomplete information
    Carbonell-Nicolau, Oriol
    ECONOMIC THEORY, 2021, 71 (04) : 1591 - 1648
  • [42] Learning to Play Imperfect-Information Games by Imitating an Oracle Planner
    Boney, Rinu
    Ilin, Alexander
    Kannala, Juho
    Seppanen, Jarno
    IEEE TRANSACTIONS ON GAMES, 2022, 14 (02) : 262 - 272
  • [43] The simple geometry of perfect information games
    Demichelis, S
    Ritzberger, K
    Swinkels, JM
    INTERNATIONAL JOURNAL OF GAME THEORY, 2004, 32 (03) : 315 - 338
  • [44] Adaptive Learning in Imperfect Monitoring Games
    Gilli, Mario
    REVIEW OF ECONOMIC DYNAMICS, 1999, 2 (02) : 472 - 485
  • [46] Limited Lookahead in Imperfect-Information Games
    Kroer, Christian
    Sandholm, Tuomas
    PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), 2015, : 575 - 581
  • [47] Solving Imperfect Information Games Using Decomposition
    Burch, Neil
    Johanson, Michael
    Bowling, Michael
    PROCEEDINGS OF THE TWENTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2014, : 602 - 608
  • [48] THE USE OF INFORMATION IN REPEATED GAMES WITH IMPERFECT MONITORING
    KANDORI, M
    REVIEW OF ECONOMIC STUDIES, 1992, 59 (03): : 581 - 593
  • [49] Strategy construction for parity games with imperfect information
    Berwanger, Dietmar
    Chatterjee, Krishnendu
    Doyen, Laurent
    Henzinger, Thomas A.
    Raje, Sangram
    CONCUR 2008 - CONCURRENCY THEORY, PROCEEDINGS, 2008, 5201 : 325 - +
  • [50] Qualitative Concurrent Stochastic Games with Imperfect Information
    Gripon, Vincent
    Serre, Olivier
    AUTOMATA, LANGUAGES AND PROGRAMMING, PT II, PROCEEDINGS, 2009, 5556 : 200 - 211