Multi-Step Generalized Policy Improvement by Leveraging Approximate Models

被引:0
|
作者
Alegre, Lucas N. [1 ,2 ]
Bazzan, Ana L. C. [1 ]
Nowe, Ann [2 ]
da Silva, Bruno C. [3 ]
机构
[1] Univ Fed Rio Grande do Sul, Inst Informat, Porto Alegre, RS, Brazil
[2] Vrije Univ Brussel, Artificial Intelligence Lab, Brussels, Belgium
[3] Univ Massachusetts, Amherst, MA 01003 USA
基金
巴西圣保罗研究基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce a principled method for performing zero-shot transfer in reinforcement learning (RL) by exploiting approximate models of the environment. Zero-shot transfer in RL has been investigated by leveraging methods rooted in generalized policy improvement (GPI) and successor features (SFs). Although computationally efficient, these methods are model-free: they analyze a library of policies-each solving a particular task-and identify which action the agent should take. We investigate the more general setting where, in addition to a library of policies, the agent has access to an approximate environment model. Even though model-based RL algorithms can identify near-optimal policies, they are typically computationally intensive. We introduce h-GPI, a multi-step extension of GPI that interpolates between these extremes-standard model-free GPI and fully model-based planning-as a function of a parameter, h, regulating the amount of time the agent has to reason. We prove that h-GPI's performance lower bound is strictly better than GPI's, and show that h-GPI generally outperforms GPI as h increases. Furthermore, we prove that as h increases, h-GPI's performance becomes arbitrarily less susceptible to sub-optimality in the agent's policy library. Finally, we introduce novel bounds characterizing the gains achievable by h-GPI as a function of approximation errors in both the agent's policy library and its (possibly learned) model. These bounds strictly generalize those known in the literature. We evaluate h-GPI on challenging tabular and continuous-state problems under value function approximation and show that it consistently outperforms GPI and state-of-the-art competing methods under various levels of approximation errors.
引用
收藏
页数:25
相关论文
共 50 条
  • [31] Multi-step Generation of Bayesian Networks Models for Software Projects Estimations
    Fuentetaja, Raquel
    Borrajo, Daniel
    Linares Lopez, Carlos
    Ocon, Jorge
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2013, 6 (05) : 796 - 821
  • [32] Multi-step Iterative Automated Domain Modeling with Large Language Models
    Yang, Yujing
    Chen, Boqi
    Chen, Kua
    Mussbacher, Gunter
    Varro, Daniel
    ACM/IEEE 27TH INTERNATIONAL CONFERENCE ON MODEL DRIVEN ENGINEERING LANGUAGES AND SYSTEMS: COMPANION PROCEEDINGS, MODELS 2024, 2024, : 587 - 595
  • [33] Multi-step prediction for nonlinear autoregressive models based on empirical distributions
    Guo, MH
    Bai, ZD
    An, HZ
    STATISTICA SINICA, 1999, 9 (02) : 559 - 570
  • [34] Bootstrap multi-step forecasts of non-Gaussian VAR models
    Fresoli, Diego
    Ruiz, Esther
    Pascual, Lorenzo
    INTERNATIONAL JOURNAL OF FORECASTING, 2015, 31 (03) : 834 - 848
  • [35] A Multi-Step Least-Squares Method for Nonlinear Rational Models
    Wang, Mingliang
    Jacobsen, Elling W.
    Chotteau, Veronique
    Hjalmarsson, Hakan
    2019 AMERICAN CONTROL CONFERENCE (ACC), 2019, : 4509 - 4514
  • [36] Visual discrimination task improvement: A multi-step process occurring during sleep
    Stickgold, R
    Whidbee, D
    Schirmer, B
    Patel, V
    Hobson, JA
    JOURNAL OF COGNITIVE NEUROSCIENCE, 2000, 12 (02) : 246 - 254
  • [37] Multi-step estimation for forecasting
    Clements, MP
    Hendry, DF
    OXFORD BULLETIN OF ECONOMICS AND STATISTICS, 1996, 58 (04) : 657 - +
  • [38] Improvement of weighted compact scheme with multi-step strategy for supersonic compressible flow
    Peng, Jun
    Shen, Yiqing
    COMPUTERS & FLUIDS, 2015, 115 : 243 - 255
  • [39] Multi-Step Classification Trees
    Chang, Youngjae
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2012, 41 (09) : 1728 - 1744
  • [40] Multi-Step Counting ADC
    Payandehnia, Pedram
    Meng, Xin
    Temes, Gabor C.
    2014 IEEE 57TH INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2014, : 17 - 20