Loss Dynamics of Temporal Difference Reinforcement Learning

被引:0
|
作者
Bordelon, Blake [1 ]
Masset, Paul [1 ]
Kuo, Henry [1 ]
Pehlevan, Cengiz [1 ]
机构
[1] Harvard Univ, John Paulson Sch Engn & Appl Sci, Ctr Brain Sci, Kempner Inst Study Nat & Artificial Intelligence, Cambridge, MA 02138 USA
关键词
STATISTICAL-MECHANICS; CONVERGENCE; HIPPOCAMPUS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement learning has been successful across several applications in which agents have to learn to act in environments with sparse feedback. However, despite this empirical success there is still a lack of theoretical understanding of how the parameters of reinforcement learning models and the features used to represent states interact to control the dynamics of learning. In this work, we use concepts from statistical physics, to study the typical case learning curves for temporal difference learning of a value function with linear function approximators. Our theory is derived under a Gaussian equivalence hypothesis where averages over the random trajectories are replaced with temporally correlated Gaussian feature averages and we validate our assumptions on small scale Markov Decision Processes. We find that the stochastic semi-gradient noise due to subsampling the space of possible episodes leads to significant plateaus in the value error, unlike in traditional gradient descent dynamics. We study how learning dynamics and plateaus depend on feature structure, learning rate, discount factor, and reward function. We then analyze how strategies like learning rate annealing and reward shaping can favorably alter learning dynamics and plateaus. To conclude, our work introduces new tools to open a new direction towards developing a theory of learning dynamics in reinforcement learning.
引用
收藏
页数:28
相关论文
共 50 条
  • [21] Relative loss bounds for temporal-difference learning
    Forster, J
    Warmuth, MK
    MACHINE LEARNING, 2003, 51 (01) : 23 - 50
  • [22] Kernel Temporal Difference based Reinforcement Learning for Brain Machine Interfaces
    Shen, Xiang
    Zhang, Xiang
    Wang, Yiwen
    2021 43RD ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY (EMBC), 2021, : 6721 - 6724
  • [23] Correlation minimizing replay memory in temporal-difference reinforcement learning
    Ramicic, Mirza
    Bonarinib, Andrea
    NEUROCOMPUTING, 2020, 393 : 91 - 100
  • [24] Fixed-Horizon Temporal Difference Methods for Stable Reinforcement Learning
    De Asis, Kristopher
    Chan, Alan
    Pitis, Silviu
    Sutton, Richard S.
    Graves, Daniel
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 3741 - 3748
  • [25] A Robust Exploration Strategy in Reinforcement Learning Based on Temporal Difference Error
    Hajar, Muhammad Shadi
    Kalutarage, Harsha
    Al-Kadri, M. Omar
    AI 2022: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, 13728 : 789 - 799
  • [26] A reinforcement learning algorithm using temporal difference error in ant model
    Lee, S
    Chung, T
    COMPUTATIONAL INTELLIGENCE AND BIOINSPIRED SYSTEMS, PROCEEDINGS, 2005, 3512 : 217 - 224
  • [27] Data Efficient Deep Reinforcement Learning With Action-Ranked Temporal Difference Learning
    Liu, Qi
    Li, Yanjie
    Liu, Yuecheng
    Lin, Ke
    Gao, Jianqi
    Lou, Yunjiang
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (04): : 2949 - 2961
  • [28] Hippocampal replay contributes to within session learning in a temporal difference reinforcement learning model
    Johnson, A
    Redish, AD
    NEURAL NETWORKS, 2005, 18 (09) : 1163 - 1171
  • [29] Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity
    Liu, Bo
    Gemp, Ian
    Ghavamzadeh, Mohammad
    Liu, Ji
    Mahadevan, Sridhar
    Petrik, Marek
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2018, 63 : 461 - 494
  • [30] PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning
    Filos, Angelos
    Lyle, Clare
    Gal, Yarin
    Levine, Sergey
    Jaques, Natasha
    Farquhar, Gregory
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139