Learning Diverse Risk Preferences in Population-Based Self-Play

被引:0
|
作者
Jiang, Yuhua [1 ]
Liu, Qihan [1 ]
Ma, Xiaoteng [1 ]
Li, Chenghao [1 ]
Yang, Yiqin [1 ]
Yang, Jun [1 ]
Liang, Bin [1 ]
Zhao, Qianchuan [1 ]
机构
[1] Tsinghua Univ, Dept Automat, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
REINFORCEMENT; LEVEL;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Among the remarkable successes of Reinforcement Learning (RL), self-play algorithms have played a crucial role in solving competitive games. However, current self-play RL methods commonly optimize the agent to maximize the expected win-rates against its current or historical copies, resulting in a limited strategy style and a tendency to get stuck in local optima. To address this limitation, it is important to improve the diversity of policies, allowing the agent to break stalemates and enhance its robustness when facing with different opponents. In this paper, we present a novel perspective to promote diversity by considering that agents could have diverse risk preferences in the face of uncertainty. To achieve this, we introduce a novel reinforcement learning algorithm called Risk-sensitive Proximal Policy Optimization (RPPO), which smoothly interpolates between worst-case and best-case policy learning, enabling policy learning with desired risk preferences. Furthermore, by seamlessly integrating RPPO with population-based self-play, agents in the population optimize dynamic risk-sensitive objectives using experiences gained from playing against diverse opponents. Our empirical results demonstrate that our method achieves comparable or superior performance in competitive games and, importantly, leads to the emergence of diverse behavioral modes. Code is available at https://github.com/Jackory/RPBT.
引用
收藏
页码:12910 / 12918
页数:9
相关论文
共 50 条
  • [11] Mastering construction heuristics with self-play deep reinforcement learning
    Wang, Qi
    He, Yuqing
    Tang, Chunlei
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (06): : 4723 - 4738
  • [12] Anytime Self-play Learning to Satisfy Functional Optimality Criteria
    Burkov, Andriy
    Chaib-draa, Brahim
    ALGORITHMIC DECISION THEORY, PROCEEDINGS, 2009, 5783 : 446 - 457
  • [13] Reinforcement learning for extended reality: designing self-play scenarios
    Leal, Leonardo A. Espinosa
    Chapman, Anthony
    Westerlund, Magnus
    PROCEEDINGS OF THE 52ND ANNUAL HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES, 2019, : 156 - 163
  • [14] Mastering construction heuristics with self-play deep reinforcement learning
    Qi Wang
    Yuqing He
    Chunlei Tang
    Neural Computing and Applications, 2023, 35 : 4723 - 4738
  • [15] DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning
    Zha, Daochen
    Xie, Jingru
    Ma, Wenye
    Zhang, Sheng
    Lian, Xiangru
    Hu, Xia
    Liu, Ji
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [16] Self-play reinforcement learning with comprehensive critic in computer games
    Liu, Shanqi
    Cao, Junjie
    Wang, Yujie
    Chen, Wenzhou
    Liu, Yong
    NEUROCOMPUTING, 2021, 449 : 207 - 213
  • [17] Distributed Reinforcement Learning with Self-Play in Parameterized Action Space
    Ma, Jun
    Yao, Shunyi
    Chen, Guangda
    Song, Jiakai
    Ji, Jianmin
    2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, : 1178 - 1185
  • [18] Finding Effective Security Strategies through Reinforcement Learning and Self-Play
    Hammar, Kim
    Stadler, Rolf
    2020 16TH INTERNATIONAL CONFERENCE ON NETWORK AND SERVICE MANAGEMENT (CNSM), 2020,
  • [19] Manipulating the Distributions of Experience used for Self-Play Learning in Expert Iteration
    Soemers, Dennis J. N. J.
    Piette, Eric
    Stephenson, Matthew
    Browne, Cameron
    2020 IEEE CONFERENCE ON GAMES (IEEE COG 2020), 2020, : 245 - 252
  • [20] Learning a game strategy using pattern-weights and self-play
    Shapiro, A
    Fuchs, G
    Levinson, R
    COMPUTERS AND GAMES, 2003, 2883 : 42 - 60