Prediction and Control in Continual Reinforcement Learning

被引:0
|
作者
Anand, Nishanth [1 ,2 ]
Precup, Doina [1 ,3 ]
机构
[1] McGill Univ, Sch Comp Sci, Montreal, PQ, Canada
[2] Mila, Milan, Italy
[3] Deepmind, London, England
基金
加拿大自然科学与工程研究理事会;
关键词
GAME; GO;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Temporal difference (TD) learning is often used to update the estimate of the value function which is used by RL agents to extract useful policies. In this paper, we focus on value function estimation in continual reinforcement learning. We propose to decompose the value function into two components which update at different timescales: a permanent value function, which holds general knowledge that persists over time, and a transient value function, which allows quick adaptation to new situations. We establish theoretical results showing that our approach is well suited for continual learning and draw connections to the complementary learning systems (CLS) theory from neuroscience. Empirically, this approach improves performance significantly on both prediction and control problems.
引用
收藏
页数:39
相关论文
共 50 条
  • [31] Continual deep reinforcement learning with task-agnostic policy distillation
    Hafez, Muhammad Burhan
    Erekmen, Kerim
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [32] Online Continual Learning for Control of Mobile Robots
    Sarabakha, Andriy
    Qiao, Zhongzheng
    Ramasamy, Savitha
    Suganthan, Ponnuthurai Nagaratnam
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [33] CORA: BENCHMARKS, BASELINES, AND METRICS AS A PLATFORM FOR CONTINUAL REINFORCEMENT LEARNING AGENTS
    Powers, Sam
    Xing, Eliot
    Kolve, Eric
    Mottaghi, Roozbeh
    Gupta, Abhinav
    CONFERENCE ON LIFELONG LEARNING AGENTS, VOL 199, 2022, 199
  • [34] Continual portfolio selection in dynamic environments via incremental reinforcement learning
    Shu Liu
    Bo Wang
    Huaxiong Li
    Chunlin Chen
    Zhi Wang
    International Journal of Machine Learning and Cybernetics, 2023, 14 : 269 - 279
  • [35] Structured prediction with reinforcement learning
    Maes, Francis
    Denoyer, Ludovic
    Gallinari, Patrick
    MACHINE LEARNING, 2009, 77 (2-3) : 271 - 301
  • [36] Tuning continual exploration in reinforcement learning: An optimality property of the Boltzmann strategy
    Achbany, Youssef
    Fouss, Francois
    Yen, Luh
    Pirotte, Alain
    Saerens, Marco
    NEUROCOMPUTING, 2008, 71 (13-15) : 2507 - 2520
  • [37] Deep Reinforcement Learning amidst Continual Structured Non-Stationarity
    Xie, Annie
    Harrison, James
    Finn, Chelsea
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [38] Structured prediction with reinforcement learning
    Francis Maes
    Ludovic Denoyer
    Patrick Gallinari
    Machine Learning, 2009, 77 : 271 - 301
  • [39] Same State, Different Task: Continual Reinforcement Learning without Interference
    Kessler, Samuel
    Parker-Holder, Jack
    Ball, Philip
    Zohren, Stefan
    Roberts, Stephen J.
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 7143 - 7151
  • [40] Nonlinear prediction by reinforcement learning
    Kuremoto, T
    Obayashi, M
    Kobayashi, K
    ADVANCES IN INTELLIGENT COMPUTING, PT 1, PROCEEDINGS, 2005, 3644 : 1085 - 1094