Policy-Based Reinforcement Learning Approach in Imperfect Information Card Game

被引：0

作者：

Chrustowski, Kamil ^{[1
]}

Duch, Piotr ^{[1
]}

机构：

[1] Lodz Univ Technol, Inst Appl Comp Sci, Stefanowskiego 18-22, PL-90537 Lodz, Poland

来源：

APPLIED SCIENCES-BASEL | 2025年 / 15卷 / 04期

关键词：

reinforcement learning; policy gradient; neural networks; artificial intelligence; card game;

D O I：

10.3390/app15042121

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Games provide an excellent testing ground for machine learning and artificial intelligence, offering diverse environments with strategic challenges and complex decision-making scenarios. This study seeks to design a self-learning artificial intelligent agent capable of playing the trick-taking stage of the popular card game Thousand, known for its complex bidding system and dynamic gameplay. Due to the game's vast state space and strategic complexity, other artificial intelligence approaches, such as Monte Carlo Tree Search and Deep Counterfactual Regret Minimisation, are infeasible. To address these challenges, the enhanced version of the REINFORCE policy gradient algorithm is proposed. Introducing a score-related parameter beta designed to guide the learning process by prioritising valuable games, the proposed approach enhances policy updates and improves overall learning outcomes. Moreover, leveraging the off-policy experience replay, along with the importance weighting of behavioural policy, enhanced training stability and reduced model variance. The proposed algorithm was applied to the trick-taking stage of the popular game Thousand Schnapsen in a two-player setup. Four distinct neural network models were explored to evaluate the performance of the proposed approach. A custom test suite of selected deals and tournament evaluations was employed to assess effectiveness. Comparisons were made against two benchmark strategies: a random strategy agent and an alpha-beta pruning tree search with varying search depths. The proposed algorithm achieved win rates exceeding 65% against the random agent, nearly 60% against alpha-beta pruning at a search depth of 6, and 55% against alpha-beta pruning at the maximum possible depth.

引用

页数：14

共 50 条

[41] Reinforcement Learning Agents Playing Ticket to Ride-A Complex Imperfect Information Board Game With Delayed Rewards
Yang, Shuo
Barlow, Michael
Townsend, Thomas
Liu, Xuejie
Samarasinghe, Dilini
Lakshika, Erandi
Moy, Glennn
Lynar, Timothy
Turnbull, Benjamin
IEEE ACCESS, 2023, 11 : 60737 - 60757
[42] ExPDT: A Policy-based Approach for Automating Compliance
Sackmann, Stefan
Kaehmer, Martin
WIRTSCHAFTSINFORMATIK, 2008, 50 (05): : 366 - 374
[43] A Phased Approach to Policy-Based Spectrum Operations
Swain, Darcy
Fritz, David
McDonald, Howard
2012 IEEE MILITARY COMMUNICATIONS CONFERENCE (MILCOM 2012), 2012,
[44] Creating Possible Worlds Using Sims Tables for the Imperfect Information Card Game Schnapsen
Wisser, Florian
22ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2010), PROCEEDINGS, VOL 2, 2010, : 7 - 10
[45] Information Control by Policy-Based Relational Weakening Templates
Biskup, Joachim
Preuss, Marcel
COMPUTER SECURITY - ESORICS 2016, PT II, 2016, 9879 : 361 - 381
[46] Policy-based information sharing in publish/subscribe middleware
Singh, Jatinder
Vargas, Luis
Bacon, Jean
Moody, Ken
2008 IEEE WORKSHOP ON POLICIES FOR DISTRIBUTED SYSTEMS AND NETWORKS, PROCEEDINGS, 2008, : 137 - 144
[47] A Policy-based Approach for Measuring Data Quality
Grueneberg, K.
Calo, S.
Dewan, P.
Verma, D.
O'Gorman, Tristan
2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 4025 - 4031
[48] An Adaptive Policy-Based Approach to SPIT Management
Soupionis, Yannis
Dritsas, Stelios
Gritzalis, Dimitris
COMPUTER SECURITY - ESORIC 2008, PROCEEDINGS, 2008, 5283 : 446 - 460
[49] Advancing Text-Driven Chest X-Ray Generation with Policy-Based Reinforcement Learning
Han, Woojung
Kim, Chanyoung
Ju, Dayun
Shim, Yumin
Hwang, Seong Jae
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT III, 2024, 15003 : 56 - 66
[50] YOLOv3 based Reinforcement learning for mobile game playing policy
Lee T.
Cho Y.
Transactions of the Korean Institute of Electrical Engineers, 2022, 71 (01): : 233 - 238

← 1 2 3 4 5 →