Federated Offline Reinforcement Learning with Proximal Policy Evaluation

被引：0

作者：

Yue, Sheng ^{[1
]}

Deng, Yongheng ^{[1
]}

Wang, Guanbo ^{[1
]}

Ren, Ju ^{[1
,2
]}

Zhang, Yaoxue ^{[1
,2
]}

机构：

[1] Tsinghua Univ, Dept Comp Sci & Technol, BNRist, Beijing 100084, Peoples R China

[2] Zhongguancun Lab, Beijing 100194, Peoples R China

来源：

CHINESE JOURNAL OF ELECTRONICS | 2024年 / 33卷 / 06期

基金：

国家重点研发计划; 中国博士后科学基金; 中国国家自然科学基金;

关键词：

Federated learning; Reinforcement learning; si Offline reinforcement learning; Batch reinforcement learning;

D O I：

10.23919/cje.2023.00.288

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Offline reinforcement learning (RL) has gathered increasing attention in recent years, which seeks to learn policies from static datasets without active online exploration. However, the existing offline RL approaches often require a large amount of pre-collected data and hence are hardly implemented by a single agent in practice. Inspired by the advancement of federated learning (FL), this paper studies federated offline reinforcement learning (FORL), whereby multiple agents collaboratively carry out offline policy learning with no need to share their raw trajectories. Clearly, a straightforward solution is to simply retrofit the off-the-shelf offline RL methods for FL, whereas such an approach easily overfits individual datasets during local updating, leading to instability and subpar performance. To overcome this challenge, we propose a new FORL algorithm, named model-free (MF)-FORL, that exploits novel "proximal local policy evaluation" to judiciously push up action values beyond local data support, enabling agents to capture the individual information without forgetting the aggregated knowledge. Further, we introduce a model-based variant, MB-FORL, capable of improving the generalization ability and computational efficiency via utilizing a learned dynamics model. We evaluate the proposed algorithms on a suite of complex and high-dimensional offline RL benchmarks, and the results demonstrate significant performance gains over the baselines.

引用

页码：1360 / 1372

页数：13

共 50 条

[41] Learning Behavior of Offline Reinforcement Learning Agents
Shukla, Indu
Dozier, Haley. R.
Henslee, Althea. C.
ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR MULTI-DOMAIN OPERATIONS APPLICATIONS VI, 2024, 13051
[42] Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in Partially Observed Markov Decision Processes
Bennett, Andrew
Kallus, Nathan
OPERATIONS RESEARCH, 2024, 72 (03) : 1071 - 1086
[43] Error bounds in reinforcement learning policy evaluation
Lu, FC
ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2005, 3501 : 438 - 449
[44] Multigrid methods for policy evaluation and reinforcement learning
Ziv, O
Shimkin, N
2005 IEEE INTERNATIONAL SYMPOSIUM ON INTELLIGENT CONTROL & 13TH MEDITERRANEAN CONFERENCE ON CONTROL AND AUTOMATION, VOLS 1 AND 2, 2005, : 1391 - 1396
[45] Least Square Policy Evaluation in Reinforcement Learning
Zhang, Haifei
Deng, Hailong
Huang, Liangbin
Hong, Ying
INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND INDUSTRIAL AUTOMATION (ICITIA 2015), 2015, : 583 - 590
[46] Offline Deep Reinforcement Learning and Off-Policy Evaluation for Personalized Basal Insulin Control in Type 1 Diabetes
Zhu, Taiyu
Li, Kezhi
Georgiou, Pantelis
IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2023, 27 (10) : 5087 - 5098
[47] Conservative Offline Distributional Reinforcement Learning
Ma, Yecheng Jason
Jayaraman, Dinesh
Bastani, Osbert
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[48] On Efficient Sampling in Offline Reinforcement Learning
Jia, Qing-Shan
2024 14TH ASIAN CONTROL CONFERENCE, ASCC 2024, 2024, : 1 - 6
[49] Offline Reinforcement Learning with Differential Privacy
Qiao, Dan
Wang, Yu-Xiang
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[50] Bootstrapped Transformer for Offline Reinforcement Learning
Wang, Kerong
Zhao, Hanye
Luo, Xufang
Ren, Kan
Zhang, Weinan
Li, Dongsheng
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,

← 1 2 3 4 5 →