Student of Games: A unified learning algorithm for both perfect and imperfect information games

被引:2
|
作者
Schmid, Martin [1 ,2 ]
Moravcik, Matej [1 ,2 ]
Burch, Neil [2 ,3 ]
Kadlec, Rudolf [1 ,2 ]
Davidson, Josh [2 ,3 ]
Waugh, Kevin [2 ,3 ]
Bard, Nolan [2 ,3 ]
Timbers, Finbarr [4 ,5 ]
Lanctot, Marc [2 ,6 ]
Holland, G. Zacharias [2 ,3 ]
Davoodi, Elnaz [2 ,6 ]
Christianson, Alden [2 ,7 ]
Bowling, Michael [2 ,4 ,7 ]
机构
[1] EquiLibre Technol, Prague, Czech Republic
[2] Google Deepmind, London, England
[3] Sony AI, New York, NY USA
[4] Amii, Edmonton, AB, Canada
[5] Midjourney, South San Francisco, CA USA
[6] Google Deepmind, Montreal, PQ, Canada
[7] Univ Alberta, Edmonton, AB, Canada
关键词
CARLO TREE-SEARCH; GO; LEVEL;
D O I
10.1126/sciadv.adg3256
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Games have a long history as benchmarks for progress in artificial intelligence. Approaches using search and learning produced strong performance across many perfect information games, and approaches using game-theoretic reasoning and learning demonstrated strong performance for specific imperfect information poker variants. We introduce Student of Games, a general-purpose algorithm that unifies previous approaches, combining guided search, self-play learning, and game-theoretic reasoning. Student of Games achieves strong empirical performance in large perfect and imperfect information games-an important step toward truly general algorithms for arbitrary environments. We prove that Student of Games is sound, converging to perfect play as available computation and approximation capacity increases. Student of Games reaches strong performance in chess and Go, beats the strongest openly available agent in heads-up no-limit Texas hold'em poker, and defeats the state-of-the-art agent in Scotland Yard, an imperfect information game that illustrates the value of guided search, learning, and game-theoretic reasoning.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Imperfect Information in Reactive Modules Games
    Gutierrez, Julian
    Perelli, Giuseppe
    Wooldridge, Michael
    ELECTRONIC PROCEEDINGS IN THEORETICAL COMPUTER SCIENCE, 2016, (218): : 56 - 58
  • [22] Imperfect Information in Reactive Modules Games
    Gutierrez, Julian
    Perelli, Giuseppe
    Wooldridge, Michael
    FIFTEENTH INTERNATIONAL CONFERENCE ON THE PRINCIPLES OF KNOWLEDGE REPRESENTATION AND REASONING, 2016, : 390 - 399
  • [23] Solving imperfect-information games
    Sandholm, Tuomas
    SCIENCE, 2015, 347 (6218) : 122 - 123
  • [24] Imperfect information in Reactive Modules games
    Gutierrez, Julian
    Perelli, Giuseppe
    Wooldridge, Michael
    INFORMATION AND COMPUTATION, 2018, 261 : 650 - 675
  • [25] Perfect communication equilibria in repeated games with imperfect monitoring
    Tomala, Tristan
    GAMES AND ECONOMIC BEHAVIOR, 2009, 67 (02) : 682 - 694
  • [26] Perfect recall and pruning in games with imperfect information (vol 12, pg 131, 1996)
    Blair, JRS
    Mutchler, D
    VanLent, M
    COMPUTATIONAL INTELLIGENCE, 1996, 12 (04) : 131 - 154
  • [27] DARWINIAN EVOLUTION IN GAMES WITH PERFECT INFORMATION
    KURKA, P
    BIOLOGICAL CYBERNETICS, 1987, 55 (05) : 281 - 288
  • [28] Belief revision in games of perfect information
    Clausing, T
    ECONOMICS AND PHILOSOPHY, 2004, 20 (01) : 89 - 115
  • [29] Dynamic games with (almost) perfect information
    He Wei
    Sun Yeneng
    THEORETICAL ECONOMICS, 2020, 15 (02) : 811 - 859
  • [30] Perfect information stochastic priority games
    Gimbert, Hugo
    Zielonka, Wieslaw
    AUTOMATA, LANGUAGES AND PROGRAMMING, PROCEEDINGS, 2007, 4596 : 850 - +