Federated Offline Reinforcement Learning with Proximal Policy Evaluation

被引:0
|
作者
Yue, Sheng [1 ]
Deng, Yongheng [1 ]
Wang, Guanbo [1 ]
Ren, Ju [1 ,2 ]
Zhang, Yaoxue [1 ,2 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, BNRist, Beijing 100084, Peoples R China
[2] Zhongguancun Lab, Beijing 100194, Peoples R China
基金
国家重点研发计划; 中国博士后科学基金; 中国国家自然科学基金;
关键词
Federated learning; Reinforcement learning; si Offline reinforcement learning; Batch reinforcement learning;
D O I
10.23919/cje.2023.00.288
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Offline reinforcement learning (RL) has gathered increasing attention in recent years, which seeks to learn policies from static datasets without active online exploration. However, the existing offline RL approaches often require a large amount of pre-collected data and hence are hardly implemented by a single agent in practice. Inspired by the advancement of federated learning (FL), this paper studies federated offline reinforcement learning (FORL), whereby multiple agents collaboratively carry out offline policy learning with no need to share their raw trajectories. Clearly, a straightforward solution is to simply retrofit the off-the-shelf offline RL methods for FL, whereas such an approach easily overfits individual datasets during local updating, leading to instability and subpar performance. To overcome this challenge, we propose a new FORL algorithm, named model-free (MF)-FORL, that exploits novel "proximal local policy evaluation" to judiciously push up action values beyond local data support, enabling agents to capture the individual information without forgetting the aggregated knowledge. Further, we introduce a model-based variant, MB-FORL, capable of improving the generalization ability and computational efficiency via utilizing a learned dynamics model. We evaluate the proposed algorithms on a suite of complex and high-dimensional offline RL benchmarks, and the results demonstrate significant performance gains over the baselines.
引用
收藏
页码:1360 / 1372
页数:13
相关论文
共 50 条
  • [41] Learning Behavior of Offline Reinforcement Learning Agents
    Shukla, Indu
    Dozier, Haley. R.
    Henslee, Althea. C.
    ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR MULTI-DOMAIN OPERATIONS APPLICATIONS VI, 2024, 13051
  • [42] Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in Partially Observed Markov Decision Processes
    Bennett, Andrew
    Kallus, Nathan
    OPERATIONS RESEARCH, 2024, 72 (03) : 1071 - 1086
  • [43] Error bounds in reinforcement learning policy evaluation
    Lu, FC
    ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2005, 3501 : 438 - 449
  • [44] Multigrid methods for policy evaluation and reinforcement learning
    Ziv, O
    Shimkin, N
    2005 IEEE INTERNATIONAL SYMPOSIUM ON INTELLIGENT CONTROL & 13TH MEDITERRANEAN CONFERENCE ON CONTROL AND AUTOMATION, VOLS 1 AND 2, 2005, : 1391 - 1396
  • [45] Least Square Policy Evaluation in Reinforcement Learning
    Zhang, Haifei
    Deng, Hailong
    Huang, Liangbin
    Hong, Ying
    INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND INDUSTRIAL AUTOMATION (ICITIA 2015), 2015, : 583 - 590
  • [46] Offline Deep Reinforcement Learning and Off-Policy Evaluation for Personalized Basal Insulin Control in Type 1 Diabetes
    Zhu, Taiyu
    Li, Kezhi
    Georgiou, Pantelis
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2023, 27 (10) : 5087 - 5098
  • [47] Conservative Offline Distributional Reinforcement Learning
    Ma, Yecheng Jason
    Jayaraman, Dinesh
    Bastani, Osbert
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [48] On Efficient Sampling in Offline Reinforcement Learning
    Jia, Qing-Shan
    2024 14TH ASIAN CONTROL CONFERENCE, ASCC 2024, 2024, : 1 - 6
  • [49] Offline Reinforcement Learning with Differential Privacy
    Qiao, Dan
    Wang, Yu-Xiang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [50] Bootstrapped Transformer for Offline Reinforcement Learning
    Wang, Kerong
    Zhao, Hanye
    Luo, Xufang
    Ren, Kan
    Zhang, Weinan
    Li, Dongsheng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,