Learning Diverse Risk Preferences in Population-Based Self-Play

被引:0
|
作者
Jiang, Yuhua [1 ]
Liu, Qihan [1 ]
Ma, Xiaoteng [1 ]
Li, Chenghao [1 ]
Yang, Yiqin [1 ]
Yang, Jun [1 ]
Liang, Bin [1 ]
Zhao, Qianchuan [1 ]
机构
[1] Tsinghua Univ, Dept Automat, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
REINFORCEMENT; LEVEL;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Among the remarkable successes of Reinforcement Learning (RL), self-play algorithms have played a crucial role in solving competitive games. However, current self-play RL methods commonly optimize the agent to maximize the expected win-rates against its current or historical copies, resulting in a limited strategy style and a tendency to get stuck in local optima. To address this limitation, it is important to improve the diversity of policies, allowing the agent to break stalemates and enhance its robustness when facing with different opponents. In this paper, we present a novel perspective to promote diversity by considering that agents could have diverse risk preferences in the face of uncertainty. To achieve this, we introduce a novel reinforcement learning algorithm called Risk-sensitive Proximal Policy Optimization (RPPO), which smoothly interpolates between worst-case and best-case policy learning, enabling policy learning with desired risk preferences. Furthermore, by seamlessly integrating RPPO with population-based self-play, agents in the population optimize dynamic risk-sensitive objectives using experiences gained from playing against diverse opponents. Our empirical results demonstrate that our method achieves comparable or superior performance in competitive games and, importantly, leads to the emergence of diverse behavioral modes. Code is available at https://github.com/Jackory/RPBT.
引用
收藏
页码:12910 / 12918
页数:9
相关论文
共 50 条
  • [1] Abalearn: A risk-sensitive approach to self-play learning in abalone
    Campos, P
    Langlois, T
    MACHINE LEARNING: ECML 2003, 2003, 2837 : 35 - 46
  • [2] Self-play Reinforcement Learning for Video Transmission
    Huang, Tianchi
    Zhang, Rui-Xiao
    Sun, Lifeng
    NOSSDAV '20: PROCEEDINGS OF THE 2020 WORKSHOP ON NETWORK AND OPERATING SYSTEM SUPPORT FOR DIGITAL AUDIO AND VIDEO, 2020, : 7 - 13
  • [3] Learning to Drive via Asymmetric Self-Play
    Zhang, Chris
    Biswas, Sourav
    Wong, Kelvin
    Fallah, Kion
    Zhang, Lunjun
    Chen, Dian
    Casas, Sergio
    Urtasun, Raquel
    COMPUTER VISION - ECCV 2024, PT LXII, 2025, 15120 : 149 - 168
  • [4] A Sharp Analysis of Model-based Reinforcement Learning with Self-Play
    Liu, Qinghua
    Yu, Tiancheng
    Bai, Yu
    Jin, Chi
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [5] Self-play reinforcement learning guides protein engineering
    Yi Wang
    Hui Tang
    Lichao Huang
    Lulu Pan
    Lixiang Yang
    Huanming Yang
    Feng Mu
    Meng Yang
    Nature Machine Intelligence, 2023, 5 : 845 - 860
  • [6] Learning self-play agents for combinatorial optimization problems
    Xu, Ruiyang
    Lieberherr, Karl
    KNOWLEDGE ENGINEERING REVIEW, 2020, 35
  • [7] Self-Play Reinforcement Learning for Fast Image Retargeting
    Kajiura, Nobukatsu
    Kosugi, Satoshi
    Wang, Xueting
    Yamasaki, Toshihiko
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1755 - 1763
  • [8] Near-Optimal Reinforcement Learning with Self-Play
    Bai, Yu
    Jin, Chi
    Yu, Tiancheng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [9] Provable Self-Play Algorithms for Competitive Reinforcement Learning
    Bai, Yu
    Jin, Chi
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [10] Self-play reinforcement learning guides protein engineering
    Wang, Yi
    Tang, Hui
    Huang, Lichao
    Pan, Lulu
    Yang, Lixiang
    Yang, Huanming
    Mu, Feng
    Yang, Meng
    NATURE MACHINE INTELLIGENCE, 2023, 5 (08) : 845 - +