Federated Offline Reinforcement Learning with Proximal Policy Evaluation

被引：0

作者：

Yue, Sheng ^{[1
]}

Deng, Yongheng ^{[1
]}

Wang, Guanbo ^{[1
]}

Ren, Ju ^{[1
,2
]}

Zhang, Yaoxue ^{[1
,2
]}

机构：

[1] Tsinghua Univ, Dept Comp Sci & Technol, BNRist, Beijing 100084, Peoples R China

[2] Zhongguancun Lab, Beijing 100194, Peoples R China

来源：

CHINESE JOURNAL OF ELECTRONICS | 2024年 / 33卷 / 06期

基金：

国家重点研发计划; 中国博士后科学基金; 中国国家自然科学基金;

关键词：

Federated learning; Reinforcement learning; si Offline reinforcement learning; Batch reinforcement learning;

D O I：

10.23919/cje.2023.00.288

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Offline reinforcement learning (RL) has gathered increasing attention in recent years, which seeks to learn policies from static datasets without active online exploration. However, the existing offline RL approaches often require a large amount of pre-collected data and hence are hardly implemented by a single agent in practice. Inspired by the advancement of federated learning (FL), this paper studies federated offline reinforcement learning (FORL), whereby multiple agents collaboratively carry out offline policy learning with no need to share their raw trajectories. Clearly, a straightforward solution is to simply retrofit the off-the-shelf offline RL methods for FL, whereas such an approach easily overfits individual datasets during local updating, leading to instability and subpar performance. To overcome this challenge, we propose a new FORL algorithm, named model-free (MF)-FORL, that exploits novel "proximal local policy evaluation" to judiciously push up action values beyond local data support, enabling agents to capture the individual information without forgetting the aggregated knowledge. Further, we introduce a model-based variant, MB-FORL, capable of improving the generalization ability and computational efficiency via utilizing a learned dynamics model. We evaluate the proposed algorithms on a suite of complex and high-dimensional offline RL benchmarks, and the results demonstrate significant performance gains over the baselines.

引用

页码：1360 / 1372

页数：13

共 50 条

[1] Federated Offline Reinforcement Learning with Proximal Policy Evaluation
Sheng YUE
Yongheng DENG
Guanbo WANG
Ju REN
Yaoxue ZHANG
Chinese Journal of Electronics, 2024, 33 (06) : 1360 - 1372
[2] Federated Offline Reinforcement Learning
Zhou, Doudou
Zhang, Yufeng
Sonabend-W, Aaron
Wang, Zhaoran
Lu, Junwei
Cai, Tianxi
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024, 119 (548) : 3152 - 3163
[3] Federated Offline Reinforcement Learning With Multimodal Data
Wen, Jiabao
Dai, Huiao
He, Jingyi
Xi, Meng
Xiao, Shuai
Yang, Jiachen
IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2024, 70 (01) : 4266 - 4276
[4] Mild evaluation policy via dataset constraint for offline reinforcement learning
Li, Xue
Ling, Xinghong
EXPERT SYSTEMS WITH APPLICATIONS, 2025, 274
[5] Supported Policy Optimization for Offline Reinforcement Learning
Wu, Jialong
Wu, Haixu
Qiu, Zihan
Wang, Jianmin
Long, Mingsheng
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[6] Implicit policy constraint for offline reinforcement learning
Peng, Zhiyong
Liu, Yadong
Han, Changlin
Zhou, Zongtan
CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2024, 9 (04) : 973 - 981
[7] Weighted Policy Constraints for Offline Reinforcement Learning
Peng, Zhiyong
Han, Changlin
Liu, Yadong
Zhou, Zongtan
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 8, 2023, : 9435 - 9443
[8] Adaptive Policy Learning for Offline-to-Online Reinforcement Learning
Zheng, Han
Luo, Xufang
Wei, Pengfei
Song, Xuan
Li, Dongsheng
Jiang, Jing
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 9, 2023, : 11372 - 11380
[9] Diversification of Adaptive Policy for Effective Offline Reinforcement Learning
Choi, Yunseon
Zhao, Li
Zhang, Chuheng
Song, Lei
Bian, Jiang
Kim, Kee-Eung
PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 3863 - 3871
[10] OFFLINE REINFORCEMENT LEARNING WITH POLICY GUIDANCE AND UNCERTAINTY ESTIMATION
Wu, Lan
Liu, Quan
Zhang, Lihua
Huang, Zhigang
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 5010 - 5014

← 1 2 3 4 5 →