Policy-Based Reinforcement Learning Approach in Imperfect Information Card Game

被引:0
|
作者
Chrustowski, Kamil [1 ]
Duch, Piotr [1 ]
机构
[1] Lodz Univ Technol, Inst Appl Comp Sci, Stefanowskiego 18-22, PL-90537 Lodz, Poland
来源
APPLIED SCIENCES-BASEL | 2025年 / 15卷 / 04期
关键词
reinforcement learning; policy gradient; neural networks; artificial intelligence; card game;
D O I
10.3390/app15042121
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Games provide an excellent testing ground for machine learning and artificial intelligence, offering diverse environments with strategic challenges and complex decision-making scenarios. This study seeks to design a self-learning artificial intelligent agent capable of playing the trick-taking stage of the popular card game Thousand, known for its complex bidding system and dynamic gameplay. Due to the game's vast state space and strategic complexity, other artificial intelligence approaches, such as Monte Carlo Tree Search and Deep Counterfactual Regret Minimisation, are infeasible. To address these challenges, the enhanced version of the REINFORCE policy gradient algorithm is proposed. Introducing a score-related parameter beta designed to guide the learning process by prioritising valuable games, the proposed approach enhances policy updates and improves overall learning outcomes. Moreover, leveraging the off-policy experience replay, along with the importance weighting of behavioural policy, enhanced training stability and reduced model variance. The proposed algorithm was applied to the trick-taking stage of the popular game Thousand Schnapsen in a two-player setup. Four distinct neural network models were explored to evaluate the performance of the proposed approach. A custom test suite of selected deals and tournament evaluations was employed to assess effectiveness. Comparisons were made against two benchmark strategies: a random strategy agent and an alpha-beta pruning tree search with varying search depths. The proposed algorithm achieved win rates exceeding 65% against the random agent, nearly 60% against alpha-beta pruning at a search depth of 6, and 55% against alpha-beta pruning at the maximum possible depth.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] A Decision Tree Analysis of a Multi-Player Card Game With Imperfect Information
    Konishi, Masato
    Okubo, Seiya
    Nishino, Tetsuro
    Wakatsuki, Mitsuo
    INTERNATIONAL JOURNAL OF SOFTWARE INNOVATION, 2018, 6 (03) : 1 - 17
  • [32] Sequential Instance-Based Learning for planning in the context of an imperfect information game
    Shih, JG
    CASE-BASED REASONING RESEARCH AND DEVELOPMENT, PROCEEDINGS, 2001, 2080 : 483 - 501
  • [33] An Approach to the Development of a Game Agent based on SOM and Reinforcement Learning
    Kamei, Keiji
    Kakizoe, Yuuki
    PROCEEDINGS 2016 5TH IIAI INTERNATIONAL CONGRESS ON ADVANCED APPLIED INFORMATICS IIAI-AAI 2016, 2016, : 669 - 674
  • [34] A Deep Reinforcement Learning-Based Approach in Porker Game
    Kong, Yan
    Rui, Yefeng
    Hsia, Chih-Hsien
    Journal of Computers (Taiwan), 2023, 34 (02) : 41 - 51
  • [35] Learning from Learners: Adapting Reinforcement Learning Agents to be Competitive in a Card Game
    Banos, Pablo
    Tanevska, Ana
    Sciutti, Alessandra
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 2716 - 2723
  • [36] An Obstacle Avoidance Method Using Asynchronous Policy-based Deep Reinforcement Learning with Discrete Action
    Wang, Yuechuan
    Yao, Fenxi
    Cui, Lingguo
    Chai, Senchun
    2022 34TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2022, : 6235 - 6241
  • [37] P-CARD: Policy-based Contextual Awareness Realization for Disasters
    Talaei-Khoei, Amir
    Bleistein, Steven
    Ray, Pradeep
    Parameswaran, Nandan
    43RD HAWAII INTERNATIONAL CONFERENCE ON SYSTEMS SCIENCES VOLS 1-5 (HICSS 2010), 2010, : 584 - +
  • [38] Policy-Based Deep Reinforcement Learning for Visual Servoing Control of Mobile Robots With Visibility Constraints
    Jin, Zhehao
    Wu, Jinhui
    Liu, Andong
    Zhang, Wen-An
    Yu, Li
    IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2022, 69 (02) : 1898 - 1908
  • [39] Model-based reinforcement learning for a multi-player card game with partial observability
    Fujita, H
    Ishii, S
    2005 IEEE/WIC/ACM International Conference on Intelligent Agent Technology, Proceedings, 2005, : 467 - 470
  • [40] Two-step reinforcement learning for multistage strategy card game
    Godlewski, Konrad
    Sawicki, Bartosz
    BULLETIN OF THE POLISH ACADEMY OF SCIENCES-TECHNICAL SCIENCES, 2024, 72 (06)