Meta-Reinforcement Learning by Tracking Task Non-stationarity

被引:0
|
作者
Poiani, Riccardo [1 ]
Tirinzoni, Andrea [2 ]
Restelli, Marcello [1 ]
机构
[1] Politecn Milan, Milan, Italy
[2] Inria Lille, Lille, France
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many real-world domains are subject to a structured non-stationarity which affects the agent's goals and the environmental dynamics. Meta-reinforcement learning (RL) has been shown successful for training agents that quickly adapt to related tasks. However, most of the existing meta-RL algorithms for non-stationary domains either make strong assumptions on the task generation process or require sampling from it at training time. In this paper, we propose a novel algorithm (TRIO) that optimizes for the future by explicitly tracking the task evolution through time. At training time, TRIO learns a variational module to quickly identify latent parameters from experience samples. This module is learned jointly with an optimal exploration policy that takes task uncertainty into account. At test time, TRIO tracks the evolution of the latent parameters online, hence reducing the uncertainty over future tasks and obtaining fast adaptation through the meta-learned policy. Unlike most existing methods, TRIO does not assume Markovian task-evolution processes, it does not require information about the non-stationarity at training time, and it captures complex changes undergoing in the environment. We evaluate our algorithm on different simulated problems and show it outperforms competitive baselines.
引用
收藏
页码:2899 / 2905
页数:7
相关论文
共 50 条
  • [41] Prefrontal cortex as a meta-reinforcement learning system
    Wang, Jane X.
    Kurth-Nelson, Zeb
    Kumaran, Dharshan
    Tirumala, Dhruva
    Soyer, Hubert
    Leibo, Joel Z.
    Hassabis, Demis
    Botvinick, Matthew
    NATURE NEUROSCIENCE, 2018, 21 (06) : 860 - +
  • [42] Dealing with multiple experts and non-stationarity in inverse reinforcement learning: an application to real-life problems
    Amarildo Likmeta
    Alberto Maria Metelli
    Giorgia Ramponi
    Andrea Tirinzoni
    Matteo Giuliani
    Marcello Restelli
    Machine Learning, 2021, 110 : 2541 - 2576
  • [43] Some Considerations on Learning to Explore via Meta-Reinforcement Learning
    Stadie, Bradly C.
    Yang, Ge
    Houthooft, Rein
    Chen, Xi
    Duan, Yan
    Wu, Yuhuai
    Abbeel, Pieter
    Sutskever, Ilya
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [44] Dealing with multiple experts and non-stationarity in inverse reinforcement learning: an application to real-life problems
    Likmeta, Amarildo
    Metelli, Alberto Maria
    Ramponi, Giorgia
    Tirinzoni, Andrea
    Giuliani, Matteo
    Restelli, Marcello
    MACHINE LEARNING, 2021, 110 (09) : 2541 - 2576
  • [45] NON-STATIONARITY AND PORTFOLIO CHOICE
    BARRY, CB
    WINKLER, RL
    JOURNAL OF FINANCIAL AND QUANTITATIVE ANALYSIS, 1976, 11 (02) : 217 - 235
  • [46] Towards Efficient Task Offloading at the Edge based on Meta-Reinforcement Learning with Hybrid Action Space
    Yang, Zhao
    Deng, Yuxiang
    Wang, Ting
    Cai, Haibin
    ICC 2023-IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, 2023, : 4039 - 4044
  • [47] Meta-Reinforcement Learning with Self-Modifying Networks
    Chalvidal, Mathieu
    Serre, Thomas
    VanRullen, Rufin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [48] Model-based Adversarial Meta-Reinforcement Learning
    Lin, Zichuan
    Thomas, Garrett
    Yang, Guangwen
    Ma, Tengyu
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [49] Meta-Reinforcement Learning for Robotic Industrial Insertion Tasks
    Schoettler, Gerrit
    Nair, Ashvin
    Ojea, Juan Aparicio
    Levine, Sergey
    Solowjow, Eugen
    2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 9728 - 9735
  • [50] Reasoning from non-stationarity
    Struzik, ZR
    van Wijngaarden, WJ
    Castelo, R
    PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2002, 314 (1-4) : 246 - 255