Learning Diverse Risk Preferences in Population-Based Self-Play

被引：0

作者：

Jiang, Yuhua ^{[1
]}

Liu, Qihan ^{[1
]}

Ma, Xiaoteng ^{[1
]}

Li, Chenghao ^{[1
]}

Yang, Yiqin ^{[1
]}

Yang, Jun ^{[1
]}

Liang, Bin ^{[1
]}

Zhao, Qianchuan ^{[1
]}

机构：

[1] Tsinghua Univ, Dept Automat, Beijing, Peoples R China

来源：

THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 11 | 2024年

基金：

中国国家自然科学基金;

关键词：

REINFORCEMENT; LEVEL;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Among the remarkable successes of Reinforcement Learning (RL), self-play algorithms have played a crucial role in solving competitive games. However, current self-play RL methods commonly optimize the agent to maximize the expected win-rates against its current or historical copies, resulting in a limited strategy style and a tendency to get stuck in local optima. To address this limitation, it is important to improve the diversity of policies, allowing the agent to break stalemates and enhance its robustness when facing with different opponents. In this paper, we present a novel perspective to promote diversity by considering that agents could have diverse risk preferences in the face of uncertainty. To achieve this, we introduce a novel reinforcement learning algorithm called Risk-sensitive Proximal Policy Optimization (RPPO), which smoothly interpolates between worst-case and best-case policy learning, enabling policy learning with desired risk preferences. Furthermore, by seamlessly integrating RPPO with population-based self-play, agents in the population optimize dynamic risk-sensitive objectives using experiences gained from playing against diverse opponents. Our empirical results demonstrate that our method achieves comparable or superior performance in competitive games and, importantly, leads to the emergence of diverse behavioral modes. Code is available at https://github.com/Jackory/RPBT.

引用

页码：12910 / 12918

页数：9

共 50 条

[21] TIYUNTSONG: A SELF-PLAY REINFORCEMENT LEARNING APPROACH FOR ABR VIDEO STREAMING
Huang, Tianchi
Yao, Xin
Wu, Chenglei
Zhang, Rui-Xiao
Pang, Zhengyuan
Sun, Lifeng
2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 1678 - 1683
[22] Mastering Fighting Game Using Deep Reinforcement Learning With Self-play
Kim, Dae-Wook
Park, Sungyun
Yang, Seong-il
2020 IEEE CONFERENCE ON GAMES (IEEE COG 2020), 2020, : 576 - 583
[23] Zwei: A Self-Play Reinforcement Learning Framework for Video Transmission Services
Huang, Tianchi
Zhang, Rui-Xiao
Sun, Lifeng
IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 1350 - 1365
[24] Towards Learning Multi-agent Negotiations via Self-Play
Tang, Yichuan Charlie
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 2427 - 2435
[25] Learning Existing Social Conventions via Observationally Augmented Self-Play
Lerer, Adam
Peysakhovich, Alexander
AIES '19: PROCEEDINGS OF THE 2019 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY, 2019, : 107 - 114
[26] Learning of Evaluation Functions via Self-Play Enhanced by Checkmate Search
Nakayashiki, Taichi
Kaneko, Tomoyuki
2018 CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI), 2018, : 126 - 131
[27] Learning Algorithms with Self-Play: A New Approach to the Distributed Directory Problem
Khanchandani, Pankaj
Richter, Oliver
Rusch, Lukas
Wattenhofer, Roger
2021 IEEE 33RD INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2021), 2021, : 501 - 505
[28] Reinforcement Learning in the Game of Othello: Learning Against a Fixed Opponent and Learning from Self-Play
van der Ree, Michiel
Wiering, Marco
PROCEEDINGS OF THE 2013 IEEE SYMPOSIUM ON ADAPTIVE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING (ADPRL), 2013, : 108 - 115
[29] A Proposal of Score Distribution Predictive Model in Self-Play Deep Reinforcement Learning
Kagoshima, Kazuya
Sakaji, Hiroki
Noda, Itsuki
Transactions of the Japanese Society for Artificial Intelligence, 2024, 39 (05)
[30] Learning Policies from Self-Play with Policy Gradients and MCTS Value Estimates
Soemers, Dennis J. N. J.
Piette, Eric
Stephenson, Matthew
Browne, Cameron
2019 IEEE CONFERENCE ON GAMES (COG), 2019,

← 1 2 3 4 5 →