Policy-Based Reinforcement Learning Approach in Imperfect Information Card Game

被引：0

作者：

Chrustowski, Kamil ^{[1
]}

Duch, Piotr ^{[1
]}

机构：

[1] Lodz Univ Technol, Inst Appl Comp Sci, Stefanowskiego 18-22, PL-90537 Lodz, Poland

来源：

APPLIED SCIENCES-BASEL | 2025年 / 15卷 / 04期

关键词：

reinforcement learning; policy gradient; neural networks; artificial intelligence; card game;

D O I：

10.3390/app15042121

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Games provide an excellent testing ground for machine learning and artificial intelligence, offering diverse environments with strategic challenges and complex decision-making scenarios. This study seeks to design a self-learning artificial intelligent agent capable of playing the trick-taking stage of the popular card game Thousand, known for its complex bidding system and dynamic gameplay. Due to the game's vast state space and strategic complexity, other artificial intelligence approaches, such as Monte Carlo Tree Search and Deep Counterfactual Regret Minimisation, are infeasible. To address these challenges, the enhanced version of the REINFORCE policy gradient algorithm is proposed. Introducing a score-related parameter beta designed to guide the learning process by prioritising valuable games, the proposed approach enhances policy updates and improves overall learning outcomes. Moreover, leveraging the off-policy experience replay, along with the importance weighting of behavioural policy, enhanced training stability and reduced model variance. The proposed algorithm was applied to the trick-taking stage of the popular game Thousand Schnapsen in a two-player setup. Four distinct neural network models were explored to evaluate the performance of the proposed approach. A custom test suite of selected deals and tournament evaluations was employed to assess effectiveness. Comparisons were made against two benchmark strategies: a random strategy agent and an alpha-beta pruning tree search with varying search depths. The proposed algorithm achieved win rates exceeding 65% against the random agent, nearly 60% against alpha-beta pruning at a search depth of 6, and 55% against alpha-beta pruning at the maximum possible depth.

引用

页数：14

共 50 条

[21] Runtime verification using policy-based approach to control information flow
Sarrab, M. (sarrab@squ.edu.om), 1600, Inderscience Enterprises Ltd., 29, route de Pre-Bois, Case Postale 856, CH-1215 Geneva 15, CH-1215, Switzerland (08):
[22] Imperfect-Information Game AI Agent Based on Reinforcement Learning Using Tree Search and a Deep Neural Network
Ouyang, Xin
Zhou, Ting
ELECTRONICS, 2023, 12 (11)
[23] Adaptability Analysis of Value-based and Policy-based Deep Reinforcement Learning in Nuclear Field
Tan, Sichao
Liu, Zhen
Liu, Yongchao
Li, Tong
Liang, Biao
Wang, Bo
Li, Jiangkuan
Tian, Ruifeng
Yuanzineng Kexue Jishu/Atomic Energy Science and Technology, 2024, 58 : 382 - 392
[24] Lane-Merging Using Policy-based Reinforcement Learning and Post-Optimization
Hart, Patrick
Rychly, Leonard
Knoll, Alois
2019 IEEE INTELLIGENT TRANSPORTATION SYSTEMS CONFERENCE (ITSC), 2019, : 3176 - 3181
[25] Control of Discrete-Time Chaotic Systems with Policy-Based Deep Reinforcement Learning
Ikemoto, Junya
Ushio, Toshimitsu
IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2020, E103A (07) : 885 - 892
[26] A reinforcement learning scheme for a multi-agent card game
Fujita, H
Matsuno, Y
Ishii, S
2003 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOLS 1-5, CONFERENCE PROCEEDINGS, 2003, : 4071 - 4078
[27] WERNER: A Card Game for Reinforcement Learning of Inorganic Chemistry Nomenclature
Buendia-Atencio, Cristian
Paul Pieffet, Gilles
Lorett Velasquez, Vaneza Paola
JOURNAL OF CHEMICAL EDUCATION, 2022, 99 (05) : 2198 - 2203
[28] A policy-based privacy storage approach
Nowalczyk, Julien
Tastet-Cherel, Frederique
ICEIS 2007: PROCEEDINGS OF THE NINTH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS: DATABASES AND INFORMATION SYSTEMS INTEGRATION, 2007, : 605 - 608
[29] A XML policy-based approach for RSVP
Toktar, E
Jamhour, E
Maziero, C
TELECOMMUICATIONS AND NETWORKING - ICT 2004, 2004, 3124 : 1204 - 1209
[30] A policy-based approach to firewall management
Caldeira, F
Monteiro, E
NETWORK CONTROL AND ENGINEERING FOR QOS, SECURITY AND MOBILITY, 2003, 107 : 115 - 126

← 1 2 3 4 5 →