Policy-Based Reinforcement Learning Approach in Imperfect Information Card Game

被引:0
|
作者
Chrustowski, Kamil [1 ]
Duch, Piotr [1 ]
机构
[1] Lodz Univ Technol, Inst Appl Comp Sci, Stefanowskiego 18-22, PL-90537 Lodz, Poland
来源
APPLIED SCIENCES-BASEL | 2025年 / 15卷 / 04期
关键词
reinforcement learning; policy gradient; neural networks; artificial intelligence; card game;
D O I
10.3390/app15042121
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Games provide an excellent testing ground for machine learning and artificial intelligence, offering diverse environments with strategic challenges and complex decision-making scenarios. This study seeks to design a self-learning artificial intelligent agent capable of playing the trick-taking stage of the popular card game Thousand, known for its complex bidding system and dynamic gameplay. Due to the game's vast state space and strategic complexity, other artificial intelligence approaches, such as Monte Carlo Tree Search and Deep Counterfactual Regret Minimisation, are infeasible. To address these challenges, the enhanced version of the REINFORCE policy gradient algorithm is proposed. Introducing a score-related parameter beta designed to guide the learning process by prioritising valuable games, the proposed approach enhances policy updates and improves overall learning outcomes. Moreover, leveraging the off-policy experience replay, along with the importance weighting of behavioural policy, enhanced training stability and reduced model variance. The proposed algorithm was applied to the trick-taking stage of the popular game Thousand Schnapsen in a two-player setup. Four distinct neural network models were explored to evaluate the performance of the proposed approach. A custom test suite of selected deals and tournament evaluations was employed to assess effectiveness. Comparisons were made against two benchmark strategies: a random strategy agent and an alpha-beta pruning tree search with varying search depths. The proposed algorithm achieved win rates exceeding 65% against the random agent, nearly 60% against alpha-beta pruning at a search depth of 6, and 55% against alpha-beta pruning at the maximum possible depth.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] Reinforcement Learning Agents Playing Ticket to Ride-A Complex Imperfect Information Board Game With Delayed Rewards
    Yang, Shuo
    Barlow, Michael
    Townsend, Thomas
    Liu, Xuejie
    Samarasinghe, Dilini
    Lakshika, Erandi
    Moy, Glennn
    Lynar, Timothy
    Turnbull, Benjamin
    IEEE ACCESS, 2023, 11 : 60737 - 60757
  • [42] ExPDT: A Policy-based Approach for Automating Compliance
    Sackmann, Stefan
    Kaehmer, Martin
    WIRTSCHAFTSINFORMATIK, 2008, 50 (05): : 366 - 374
  • [43] A Phased Approach to Policy-Based Spectrum Operations
    Swain, Darcy
    Fritz, David
    McDonald, Howard
    2012 IEEE MILITARY COMMUNICATIONS CONFERENCE (MILCOM 2012), 2012,
  • [44] Creating Possible Worlds Using Sims Tables for the Imperfect Information Card Game Schnapsen
    Wisser, Florian
    22ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2010), PROCEEDINGS, VOL 2, 2010, : 7 - 10
  • [45] Information Control by Policy-Based Relational Weakening Templates
    Biskup, Joachim
    Preuss, Marcel
    COMPUTER SECURITY - ESORICS 2016, PT II, 2016, 9879 : 361 - 381
  • [46] Policy-based information sharing in publish/subscribe middleware
    Singh, Jatinder
    Vargas, Luis
    Bacon, Jean
    Moody, Ken
    2008 IEEE WORKSHOP ON POLICIES FOR DISTRIBUTED SYSTEMS AND NETWORKS, PROCEEDINGS, 2008, : 137 - 144
  • [47] A Policy-based Approach for Measuring Data Quality
    Grueneberg, K.
    Calo, S.
    Dewan, P.
    Verma, D.
    O'Gorman, Tristan
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 4025 - 4031
  • [48] An Adaptive Policy-Based Approach to SPIT Management
    Soupionis, Yannis
    Dritsas, Stelios
    Gritzalis, Dimitris
    COMPUTER SECURITY - ESORIC 2008, PROCEEDINGS, 2008, 5283 : 446 - 460
  • [49] Advancing Text-Driven Chest X-Ray Generation with Policy-Based Reinforcement Learning
    Han, Woojung
    Kim, Chanyoung
    Ju, Dayun
    Shim, Yumin
    Hwang, Seong Jae
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT III, 2024, 15003 : 56 - 66
  • [50] YOLOv3 based Reinforcement learning for mobile game playing policy
    Lee T.
    Cho Y.
    Transactions of the Korean Institute of Electrical Engineers, 2022, 71 (01): : 233 - 238