Federated Offline Reinforcement Learning with Proximal Policy Evaluation

被引:0
|
作者
Yue, Sheng [1 ]
Deng, Yongheng [1 ]
Wang, Guanbo [1 ]
Ren, Ju [1 ,2 ]
Zhang, Yaoxue [1 ,2 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, BNRist, Beijing 100084, Peoples R China
[2] Zhongguancun Lab, Beijing 100194, Peoples R China
基金
国家重点研发计划; 中国博士后科学基金; 中国国家自然科学基金;
关键词
Federated learning; Reinforcement learning; si Offline reinforcement learning; Batch reinforcement learning;
D O I
10.23919/cje.2023.00.288
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Offline reinforcement learning (RL) has gathered increasing attention in recent years, which seeks to learn policies from static datasets without active online exploration. However, the existing offline RL approaches often require a large amount of pre-collected data and hence are hardly implemented by a single agent in practice. Inspired by the advancement of federated learning (FL), this paper studies federated offline reinforcement learning (FORL), whereby multiple agents collaboratively carry out offline policy learning with no need to share their raw trajectories. Clearly, a straightforward solution is to simply retrofit the off-the-shelf offline RL methods for FL, whereas such an approach easily overfits individual datasets during local updating, leading to instability and subpar performance. To overcome this challenge, we propose a new FORL algorithm, named model-free (MF)-FORL, that exploits novel "proximal local policy evaluation" to judiciously push up action values beyond local data support, enabling agents to capture the individual information without forgetting the aggregated knowledge. Further, we introduce a model-based variant, MB-FORL, capable of improving the generalization ability and computational efficiency via utilizing a learned dynamics model. We evaluate the proposed algorithms on a suite of complex and high-dimensional offline RL benchmarks, and the results demonstrate significant performance gains over the baselines.
引用
收藏
页码:1360 / 1372
页数:13
相关论文
共 50 条
  • [1] Federated Offline Reinforcement Learning with Proximal Policy Evaluation
    Sheng YUE
    Yongheng DENG
    Guanbo WANG
    Ju REN
    Yaoxue ZHANG
    Chinese Journal of Electronics, 2024, 33 (06) : 1360 - 1372
  • [2] Federated Offline Reinforcement Learning
    Zhou, Doudou
    Zhang, Yufeng
    Sonabend-W, Aaron
    Wang, Zhaoran
    Lu, Junwei
    Cai, Tianxi
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024, 119 (548) : 3152 - 3163
  • [3] Federated Offline Reinforcement Learning With Multimodal Data
    Wen, Jiabao
    Dai, Huiao
    He, Jingyi
    Xi, Meng
    Xiao, Shuai
    Yang, Jiachen
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2024, 70 (01) : 4266 - 4276
  • [4] Mild evaluation policy via dataset constraint for offline reinforcement learning
    Li, Xue
    Ling, Xinghong
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 274
  • [5] Supported Policy Optimization for Offline Reinforcement Learning
    Wu, Jialong
    Wu, Haixu
    Qiu, Zihan
    Wang, Jianmin
    Long, Mingsheng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [6] Implicit policy constraint for offline reinforcement learning
    Peng, Zhiyong
    Liu, Yadong
    Han, Changlin
    Zhou, Zongtan
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2024, 9 (04) : 973 - 981
  • [7] Weighted Policy Constraints for Offline Reinforcement Learning
    Peng, Zhiyong
    Han, Changlin
    Liu, Yadong
    Zhou, Zongtan
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 8, 2023, : 9435 - 9443
  • [8] Adaptive Policy Learning for Offline-to-Online Reinforcement Learning
    Zheng, Han
    Luo, Xufang
    Wei, Pengfei
    Song, Xuan
    Li, Dongsheng
    Jiang, Jing
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 9, 2023, : 11372 - 11380
  • [9] Diversification of Adaptive Policy for Effective Offline Reinforcement Learning
    Choi, Yunseon
    Zhao, Li
    Zhang, Chuheng
    Song, Lei
    Bian, Jiang
    Kim, Kee-Eung
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 3863 - 3871
  • [10] OFFLINE REINFORCEMENT LEARNING WITH POLICY GUIDANCE AND UNCERTAINTY ESTIMATION
    Wu, Lan
    Liu, Quan
    Zhang, Lihua
    Huang, Zhigang
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 5010 - 5014