Student of Games: A unified learning algorithm for both perfect and imperfect information games

被引：2

作者：

Schmid, Martin ^{[1
,2
]}

Moravcik, Matej ^{[1
,2
]}

Burch, Neil ^{[2
,3
]}

Kadlec, Rudolf ^{[1
,2
]}

Davidson, Josh ^{[2
,3
]}

Waugh, Kevin ^{[2
,3
]}

Bard, Nolan ^{[2
,3
]}

Timbers, Finbarr ^{[4
,5
]}

Lanctot, Marc ^{[2
,6
]}

Holland, G. Zacharias ^{[2
,3
]}

Davoodi, Elnaz ^{[2
,6
]}

Christianson, Alden ^{[2
,7
]}

Bowling, Michael ^{[2
,4
,7
]}

机构：

[1] EquiLibre Technol, Prague, Czech Republic

[2] Google Deepmind, London, England

[3] Sony AI, New York, NY USA

[4] Amii, Edmonton, AB, Canada

[5] Midjourney, South San Francisco, CA USA

[6] Google Deepmind, Montreal, PQ, Canada

[7] Univ Alberta, Edmonton, AB, Canada

来源：

SCIENCE ADVANCES | 2023年 / 9卷 / 46期

关键词：

CARLO TREE-SEARCH; GO; LEVEL;

D O I：

10.1126/sciadv.adg3256

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Games have a long history as benchmarks for progress in artificial intelligence. Approaches using search and learning produced strong performance across many perfect information games, and approaches using game-theoretic reasoning and learning demonstrated strong performance for specific imperfect information poker variants. We introduce Student of Games, a general-purpose algorithm that unifies previous approaches, combining guided search, self-play learning, and game-theoretic reasoning. Student of Games achieves strong empirical performance in large perfect and imperfect information games-an important step toward truly general algorithms for arbitrary environments. We prove that Student of Games is sound, converging to perfect play as available computation and approximation capacity increases. Student of Games reaches strong performance in chess and Go, beats the strongest openly available agent in heads-up no-limit Texas hold'em poker, and defeats the state-of-the-art agent in Scotland Yard, an imperfect information game that illustrates the value of guided search, learning, and game-theoretic reasoning.

引用

页数：14

共 50 条

[21] Imperfect Information in Reactive Modules Games
Gutierrez, Julian
Perelli, Giuseppe
Wooldridge, Michael
ELECTRONIC PROCEEDINGS IN THEORETICAL COMPUTER SCIENCE, 2016, (218): : 56 - 58
[22] Imperfect Information in Reactive Modules Games
Gutierrez, Julian
Perelli, Giuseppe
Wooldridge, Michael
FIFTEENTH INTERNATIONAL CONFERENCE ON THE PRINCIPLES OF KNOWLEDGE REPRESENTATION AND REASONING, 2016, : 390 - 399
[23] Solving imperfect-information games
Sandholm, Tuomas
SCIENCE, 2015, 347 (6218) : 122 - 123
[24] Imperfect information in Reactive Modules games
Gutierrez, Julian
Perelli, Giuseppe
Wooldridge, Michael
INFORMATION AND COMPUTATION, 2018, 261 : 650 - 675
[25] Perfect communication equilibria in repeated games with imperfect monitoring
Tomala, Tristan
GAMES AND ECONOMIC BEHAVIOR, 2009, 67 (02) : 682 - 694
[26] Perfect recall and pruning in games with imperfect information (vol 12, pg 131, 1996)
Blair, JRS
Mutchler, D
VanLent, M
COMPUTATIONAL INTELLIGENCE, 1996, 12 (04) : 131 - 154
[27] DARWINIAN EVOLUTION IN GAMES WITH PERFECT INFORMATION
KURKA, P
BIOLOGICAL CYBERNETICS, 1987, 55 (05) : 281 - 288
[28] Belief revision in games of perfect information
Clausing, T
ECONOMICS AND PHILOSOPHY, 2004, 20 (01) : 89 - 115
[29] Dynamic games with (almost) perfect information
He Wei
Sun Yeneng
THEORETICAL ECONOMICS, 2020, 15 (02) : 811 - 859
[30] Perfect information stochastic priority games
Gimbert, Hugo
Zielonka, Wieslaw
AUTOMATA, LANGUAGES AND PROGRAMMING, PROCEEDINGS, 2007, 4596 : 850 - +

← 1 2 3 4 5 →