Policy-Based Reinforcement Learning Approach in Imperfect Information Card Game

被引:0
|
作者
Chrustowski, Kamil [1 ]
Duch, Piotr [1 ]
机构
[1] Lodz Univ Technol, Inst Appl Comp Sci, Stefanowskiego 18-22, PL-90537 Lodz, Poland
来源
APPLIED SCIENCES-BASEL | 2025年 / 15卷 / 04期
关键词
reinforcement learning; policy gradient; neural networks; artificial intelligence; card game;
D O I
10.3390/app15042121
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Games provide an excellent testing ground for machine learning and artificial intelligence, offering diverse environments with strategic challenges and complex decision-making scenarios. This study seeks to design a self-learning artificial intelligent agent capable of playing the trick-taking stage of the popular card game Thousand, known for its complex bidding system and dynamic gameplay. Due to the game's vast state space and strategic complexity, other artificial intelligence approaches, such as Monte Carlo Tree Search and Deep Counterfactual Regret Minimisation, are infeasible. To address these challenges, the enhanced version of the REINFORCE policy gradient algorithm is proposed. Introducing a score-related parameter beta designed to guide the learning process by prioritising valuable games, the proposed approach enhances policy updates and improves overall learning outcomes. Moreover, leveraging the off-policy experience replay, along with the importance weighting of behavioural policy, enhanced training stability and reduced model variance. The proposed algorithm was applied to the trick-taking stage of the popular game Thousand Schnapsen in a two-player setup. Four distinct neural network models were explored to evaluate the performance of the proposed approach. A custom test suite of selected deals and tournament evaluations was employed to assess effectiveness. Comparisons were made against two benchmark strategies: a random strategy agent and an alpha-beta pruning tree search with varying search depths. The proposed algorithm achieved win rates exceeding 65% against the random agent, nearly 60% against alpha-beta pruning at a search depth of 6, and 55% against alpha-beta pruning at the maximum possible depth.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Learning to play an imperfect information card game using reinforcement learning
    Demirdover, Bugra Kaan
    Baykal, Omer
    Alpaslan, Ferdanur
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2022, 30 (06) : 2303 - 2318
  • [2] Playing 20 Question Game with Policy-Based Reinforcement Learning
    Hu, Huang
    Wu, Xianchao
    Luo, Bingfeng
    Tao, Chongyang
    Xu, Can
    Wu, Wei
    Chen, Zhan
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 3233 - 3242
  • [3] Approximate Policy-Based Accelerated Deep Reinforcement Learning
    Wang, Xuesong
    Gu, Yang
    Cheng, Yuhu
    Liu, Aiping
    Chen, C. L. Philip
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (06) : 1820 - 1830
  • [4] A Policy-based Reinforcement Learning Approach for High-speed Railway Timetable Rescheduling
    Wang, Yin
    Lv, Yisheng
    Zhou, Jianying
    Yuan, Zhiming
    Zhang, Qi
    Zhou, Min
    2021 IEEE INTELLIGENT TRANSPORTATION SYSTEMS CONFERENCE (ITSC), 2021, : 2362 - 2367
  • [5] Policy-based reinforcement learning for time series anomaly detection
    Yu, Mengran
    Sun, Shiliang
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2020, 95
  • [6] PHH: Policy-Based Hyper-Heuristic With Reinforcement Learning
    Udomkasemsub, Orachun
    Sirinaovakul, Booncharoen
    Achalakul, Tiranee
    IEEE ACCESS, 2023, 11 : 52026 - 52049
  • [7] HDPG: Hyperdimensional Policy-based Reinforcement Learning for Continuous Control
    Ni, Yang
    Issa, Mariam
    Abraham, Danny
    Imani, Mandi
    Yin, Xunzhao
    Imani, Mohsen
    PROCEEDINGS OF THE 59TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC 2022, 2022, : 1141 - 1146
  • [8] A Policy-Based Reinforcement Learning Algorithm for Intelligent Train Control
    Zhang M.
    Zhang Q.
    Liu W.
    Zhou B.
    Tiedao Xuebao/Journal of the China Railway Society, 2020, 42 (01): : 69 - 75
  • [9] Policy-based deep reinforcement learning for sparse reward environment
    Kim M.
    Kim J.-S.
    Transactions of the Korean Institute of Electrical Engineers, 2021, 70 (03): : 506 - 514
  • [10] Policy-based reinforcement learning for time series anomaly detection
    Yu, Mengran
    Sun, Shiliang
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2020, 95 (95)