PAC-Bayesian offline Meta-reinforcement learning

被引:1
|
作者
Sun, Zheng [1 ]
Jing, Chenheng [1 ]
Guo, Shangqi [2 ,3 ]
An, Lingling [4 ]
机构
[1] Xidian Univ, Guangzhou Inst Technol, Guangzhou 510555, Peoples R China
[2] Tsinghua Univ, Dept Precis Instrument, Beijing 100086, Peoples R China
[3] Tsinghua Univ, Dept Automat, Beijing 100086, Peoples R China
[4] Xidian Univ, Sch Comp Sci & Technol, Xian 710071, Peoples R China
基金
中国国家自然科学基金;
关键词
Meta-reinforcement learning; PAC-bayesian theory; Dependency graph; Generalization bounds; BOUNDS;
D O I
10.1007/s10489-023-04911-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Meta-reinforcement learning (Meta-RL) utilizes shared structure among tasks to enable rapid adaptation to new tasks with only a little experience. However, most existing Meta-RL algorithms lack theoretical generalization guarantees or offer such guarantees under restrictive assumptions (e.g., strong assumptions on the data distribution). This paper for the first time conducts a theoretical analysis for estimating the generalization performance of the Meta-RL learner using the PAC-Bayesian theory. The application of PAC-Bayesian theory to Meta-RL poses a challenge due to the existence of dependencies in the training data, which renders the independent and identically distributed (i.i.d.) assumption invalid. To address this challenge, we propose a dependency graph-based offline decomposition (DGOD) approach, which decomposes non-i.i.d. Meta-RL data into multiple offline i.i.d. datasets by utilizing the techniques of offline sampling and graph decomposition. With the DGOD approach, we derive the practical PAC-Bayesian offline Meta-RL generalization bounds and design an algorithm with generalization guarantees to optimize them, called PAC-Bayesian Offline Meta-Actor-Critic (PBOMAC). The results of experiments conducted on several challenging Meta-RL benchmarks demonstrate that our algorithm performs well in avoiding meta-overfitting and outperforms recent state-of-the-art Meta-RL algorithms without generalization bounds.
引用
收藏
页码:27128 / 27147
页数:20
相关论文
共 50 条
  • [1] PAC-Bayesian offline Meta-reinforcement learning
    Zheng Sun
    Chenheng Jing
    Shangqi Guo
    Lingling An
    Applied Intelligence, 2023, 53 : 27128 - 27147
  • [2] Offline Meta-Reinforcement Learning for Industrial Insertion
    Zhao, Tony Z.
    Luo, Jianlan
    Sushkov, Oleg
    Pevceviciute, Rugile
    Heess, Nicolas
    Scholz, Jon
    Schaal, Stefan
    Levine, Sergey
    2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2022, 2022, : 6386 - 6393
  • [3] Offline Meta-Reinforcement Learning with Advantage Weighting
    Mitchell, Eric
    Rafailov, Rafael
    Peng, Xue Bin
    Levine, Sergey
    Finn, Chelsea
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [4] Context Shift Reduction for Offline Meta-Reinforcement Learning
    Gao, Yunkai
    Zhang, Rui
    Guo, Jiaming
    Wu, Fan
    Yi, Qi
    Peng, Shaohui
    Lan, Siming
    Chen, Ruizhi
    Du, Zidong
    Hu, Xing
    Guo, Qi
    Li, Ling
    Chen, Yunji
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [5] A PAC-Bayesian Bound for Lifelong Learning
    Pentina, Anastasia
    Lampert, Christoph H.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 2), 2014, 32 : 991 - 999
  • [6] PAC-Bayesian Theory for Transductive Learning
    Begin, Luc
    Germain, Pascal
    Laviolette, Francois
    Roy, Jean-Francis
    ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 33, 2014, 33 : 105 - 113
  • [7] Unified PAC-Bayesian Study of Pessimism for Offline Policy Learning with Regularized Importance Sampling
    Aouali, Imad
    Brunel, Victor-Emmanuel
    Rohde, David
    Korba, Anna
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2024, 244 : 88 - 109
  • [8] Meta-reinforcement learning for the tuning of PI controllers: An offline approach
    McClement, Daniel G.
    Lawrence, Nathan P.
    Backstroem, Johan U.
    Loewen, Philip D.
    Forbes, Michael G.
    Gopaluni, R. Bhushan
    JOURNAL OF PROCESS CONTROL, 2022, 118 : 139 - 152
  • [9] Offline Meta-Reinforcement Learning with Online Self-Supervision
    Pong, Vitchyr H.
    Nair, Ashvin
    Smith, Laura
    Huang, Catherine
    Levine, Sergey
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [10] Generalizable Task Representation Learning for Offline Meta-Reinforcement Learning with Data Limitations
    Zhou, Renzhe
    Gao, Chen-Xiao
    Zhang, Zongzhang
    Yu, Yang
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 15, 2024, : 17132 - 17140