Learning Diverse Risk Preferences in Population-Based Self-Play

被引:0
|
作者
Jiang, Yuhua [1 ]
Liu, Qihan [1 ]
Ma, Xiaoteng [1 ]
Li, Chenghao [1 ]
Yang, Yiqin [1 ]
Yang, Jun [1 ]
Liang, Bin [1 ]
Zhao, Qianchuan [1 ]
机构
[1] Tsinghua Univ, Dept Automat, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
REINFORCEMENT; LEVEL;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Among the remarkable successes of Reinforcement Learning (RL), self-play algorithms have played a crucial role in solving competitive games. However, current self-play RL methods commonly optimize the agent to maximize the expected win-rates against its current or historical copies, resulting in a limited strategy style and a tendency to get stuck in local optima. To address this limitation, it is important to improve the diversity of policies, allowing the agent to break stalemates and enhance its robustness when facing with different opponents. In this paper, we present a novel perspective to promote diversity by considering that agents could have diverse risk preferences in the face of uncertainty. To achieve this, we introduce a novel reinforcement learning algorithm called Risk-sensitive Proximal Policy Optimization (RPPO), which smoothly interpolates between worst-case and best-case policy learning, enabling policy learning with desired risk preferences. Furthermore, by seamlessly integrating RPPO with population-based self-play, agents in the population optimize dynamic risk-sensitive objectives using experiences gained from playing against diverse opponents. Our empirical results demonstrate that our method achieves comparable or superior performance in competitive games and, importantly, leads to the emergence of diverse behavioral modes. Code is available at https://github.com/Jackory/RPBT.
引用
收藏
页码:12910 / 12918
页数:9
相关论文
共 50 条
  • [21] TIYUNTSONG: A SELF-PLAY REINFORCEMENT LEARNING APPROACH FOR ABR VIDEO STREAMING
    Huang, Tianchi
    Yao, Xin
    Wu, Chenglei
    Zhang, Rui-Xiao
    Pang, Zhengyuan
    Sun, Lifeng
    2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 1678 - 1683
  • [22] Mastering Fighting Game Using Deep Reinforcement Learning With Self-play
    Kim, Dae-Wook
    Park, Sungyun
    Yang, Seong-il
    2020 IEEE CONFERENCE ON GAMES (IEEE COG 2020), 2020, : 576 - 583
  • [23] Zwei: A Self-Play Reinforcement Learning Framework for Video Transmission Services
    Huang, Tianchi
    Zhang, Rui-Xiao
    Sun, Lifeng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 1350 - 1365
  • [24] Towards Learning Multi-agent Negotiations via Self-Play
    Tang, Yichuan Charlie
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 2427 - 2435
  • [25] Learning Existing Social Conventions via Observationally Augmented Self-Play
    Lerer, Adam
    Peysakhovich, Alexander
    AIES '19: PROCEEDINGS OF THE 2019 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY, 2019, : 107 - 114
  • [26] Learning of Evaluation Functions via Self-Play Enhanced by Checkmate Search
    Nakayashiki, Taichi
    Kaneko, Tomoyuki
    2018 CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI), 2018, : 126 - 131
  • [27] Learning Algorithms with Self-Play: A New Approach to the Distributed Directory Problem
    Khanchandani, Pankaj
    Richter, Oliver
    Rusch, Lukas
    Wattenhofer, Roger
    2021 IEEE 33RD INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2021), 2021, : 501 - 505
  • [28] Reinforcement Learning in the Game of Othello: Learning Against a Fixed Opponent and Learning from Self-Play
    van der Ree, Michiel
    Wiering, Marco
    PROCEEDINGS OF THE 2013 IEEE SYMPOSIUM ON ADAPTIVE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING (ADPRL), 2013, : 108 - 115
  • [29] A Proposal of Score Distribution Predictive Model in Self-Play Deep Reinforcement Learning
    Kagoshima, Kazuya
    Sakaji, Hiroki
    Noda, Itsuki
    Transactions of the Japanese Society for Artificial Intelligence, 2024, 39 (05)
  • [30] Learning Policies from Self-Play with Policy Gradients and MCTS Value Estimates
    Soemers, Dennis J. N. J.
    Piette, Eric
    Stephenson, Matthew
    Browne, Cameron
    2019 IEEE CONFERENCE ON GAMES (COG), 2019,