Loss Dynamics of Temporal Difference Reinforcement Learning

被引:0
|
作者
Bordelon, Blake [1 ]
Masset, Paul [1 ]
Kuo, Henry [1 ]
Pehlevan, Cengiz [1 ]
机构
[1] Harvard Univ, John Paulson Sch Engn & Appl Sci, Ctr Brain Sci, Kempner Inst Study Nat & Artificial Intelligence, Cambridge, MA 02138 USA
关键词
STATISTICAL-MECHANICS; CONVERGENCE; HIPPOCAMPUS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement learning has been successful across several applications in which agents have to learn to act in environments with sparse feedback. However, despite this empirical success there is still a lack of theoretical understanding of how the parameters of reinforcement learning models and the features used to represent states interact to control the dynamics of learning. In this work, we use concepts from statistical physics, to study the typical case learning curves for temporal difference learning of a value function with linear function approximators. Our theory is derived under a Gaussian equivalence hypothesis where averages over the random trajectories are replaced with temporally correlated Gaussian feature averages and we validate our assumptions on small scale Markov Decision Processes. We find that the stochastic semi-gradient noise due to subsampling the space of possible episodes leads to significant plateaus in the value error, unlike in traditional gradient descent dynamics. We study how learning dynamics and plateaus depend on feature structure, learning rate, discount factor, and reward function. We then analyze how strategies like learning rate annealing and reward shaping can favorably alter learning dynamics and plateaus. To conclude, our work introduces new tools to open a new direction towards developing a theory of learning dynamics in reinforcement learning.
引用
收藏
页数:28
相关论文
共 50 条
  • [41] Primate Motor Cortical Activity Displays Hallmarks of a Temporal Difference Reinforcement Learning Process
    Tarigoppula, Venkata S. Aditya
    Choi, John S.
    Hessburg, John P.
    McNiel, David B.
    Marsh, Brandi T.
    Francis, Joseph Thachil
    2023 11TH INTERNATIONAL IEEE/EMBS CONFERENCE ON NEURAL ENGINEERING, NER, 2023,
  • [42] IMPROVING REINFORCEMENT LEARNING USING TEMPORAL-DIFFERENCE NETWORK EUROCON2009
    Karbasian, Habib
    Ahmadabadi, Majid N.
    Araabi, Babak N.
    EUROCON 2009: INTERNATIONAL IEEE CONFERENCE DEVOTED TO THE 150 ANNIVERSARY OF ALEXANDER S. POPOV, VOLS 1- 4, PROCEEDINGS, 2009, : 1716 - 1722
  • [43] Optimization of music education strategy guided by the temporal-difference reinforcement learning algorithm
    Su, Yingwei
    Wang, Yuan
    Soft Computing, 2024, 28 (13-14) : 8279 - 8291
  • [44] Temporal Shift Reinforcement Learning
    Thomas, Deepak George
    Wongpiromsarn, Tichakorn
    Jannesari, Ali
    PROCEEDINGS OF THE 2022 2ND EUROPEAN WORKSHOP ON MACHINE LEARNING AND SYSTEMS (EUROMLSYS '22), 2022, : 95 - 100
  • [45] Contracts for Difference: A Reinforcement Learning Approach
    Zengeler, Nico
    Handmann, Uwe
    JOURNAL OF RISK AND FINANCIAL MANAGEMENT, 2020, 13 (04)
  • [46] TACO: Temporal Latent Action-Driven Contrastive Loss for Visual Reinforcement Learning
    Zheng, Ruijie
    Wang, Xiyao
    Sun, Yanchao
    Ma, Shuang
    Zhao, Jieyu
    Xu, Huazhe
    Daume, Hal, III
    Huang, Furong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [47] Multi-Agent Reinforcement Learning via Adaptive Kalman Temporal Difference and Successor Representation
    Salimibeni, Mohammad
    Mohammadi, Arash
    Malekzadeh, Parvin
    Plataniotis, Konstantinos N.
    SENSORS, 2022, 22 (04)
  • [48] A sparse kernel-based least-squares temporal difference algorithm for reinforcement learning
    Xu, Xin
    ADVANCES IN NATURAL COMPUTATION, PT 1, 2006, 4221 : 47 - 56
  • [49] A Temporal Difference GNG-based Approach for the State Space Quantization in Reinforcement Learning Environments
    Vieira, Davi C. L.
    Adeodato, Paulo J. L.
    Goncalves, Paulo M., Jr.
    2013 IEEE 25TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2013, : 561 - 568
  • [50] A Temporal Difference GNG-based Algorithm that can learn to Control in Reinforcement Learning Environments
    Vieira, Davi C. L.
    Adeodato, Paulo J. L.
    Goncalves, Paulo M., Jr.
    2013 12TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2013), VOL 1, 2013, : 329 - 332