Loss Dynamics of Temporal Difference Reinforcement Learning

被引:0
|
作者
Bordelon, Blake [1 ]
Masset, Paul [1 ]
Kuo, Henry [1 ]
Pehlevan, Cengiz [1 ]
机构
[1] Harvard Univ, John Paulson Sch Engn & Appl Sci, Ctr Brain Sci, Kempner Inst Study Nat & Artificial Intelligence, Cambridge, MA 02138 USA
关键词
STATISTICAL-MECHANICS; CONVERGENCE; HIPPOCAMPUS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement learning has been successful across several applications in which agents have to learn to act in environments with sparse feedback. However, despite this empirical success there is still a lack of theoretical understanding of how the parameters of reinforcement learning models and the features used to represent states interact to control the dynamics of learning. In this work, we use concepts from statistical physics, to study the typical case learning curves for temporal difference learning of a value function with linear function approximators. Our theory is derived under a Gaussian equivalence hypothesis where averages over the random trajectories are replaced with temporally correlated Gaussian feature averages and we validate our assumptions on small scale Markov Decision Processes. We find that the stochastic semi-gradient noise due to subsampling the space of possible episodes leads to significant plateaus in the value error, unlike in traditional gradient descent dynamics. We study how learning dynamics and plateaus depend on feature structure, learning rate, discount factor, and reward function. We then analyze how strategies like learning rate annealing and reward shaping can favorably alter learning dynamics and plateaus. To conclude, our work introduces new tools to open a new direction towards developing a theory of learning dynamics in reinforcement learning.
引用
收藏
页数:28
相关论文
共 50 条
  • [31] Reinforcement temporal difference learning scheme for dynamic energy management in embedded systems
    Viswanathan, LP
    Monie, EC
    19TH INTERNATIONAL CONFERENCE ON VLSI DESIGN, PROCEEDINGS, 2005, : 645 - 650
  • [32] Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning
    Shimon Whiteson
    Matthew E. Taylor
    Peter Stone
    Autonomous Agents and Multi-Agent Systems, 2010, 21 : 1 - 35
  • [33] The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning
    Tang, Yunhao
    Rowland, Mark
    Munos, Remi
    Pires, Bernardo Avila
    Dabney, Will
    Bellemare, Marc G.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [34] Toward Robots' Behavioral Transparency of Temporal Difference Reinforcement Learning With a Human Teacher
    Matarese, Marco
    Sciutti, Alessandra
    Rea, Francesco
    Rossi, Silvia
    IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, 2021, 51 (06) : 578 - 589
  • [35] Reinforcement learning: computing the temporal difference of values via distinct corticostriatal pathways
    Morita, Kenji
    Morishima, Mieko
    Sakai, Katsuyuki
    Kawaguchi, Yasuo
    TRENDS IN NEUROSCIENCES, 2012, 35 (08) : 457 - 467
  • [36] Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning
    Whiteson, Shimon
    Taylor, Matthew E.
    Stone, Peter
    AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2010, 21 (01) : 1 - 35
  • [37] An automated signalized junction controller that learns strategies by temporal difference reinforcement learning
    Box, Simon
    Waterson, Ben
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2013, 26 (01) : 652 - 659
  • [38] DynaSTI: Dynamics modeling with sequential temporal information for reinforcement learning in Atari
    Kim, Jaehoon
    Lee, Young Jae
    Kwak, Mingu
    Park, Young Joon
    Kim, Seoung Bum
    KNOWLEDGE-BASED SYSTEMS, 2024, 299
  • [39] A Unified Approach for Multi-step Temporal-Difference Learning with Eligibility Traces in Reinforcement Learning
    Yang, Long
    Shi, Minhao
    Zheng, Qian
    Meng, Wenjia
    Pan, Gang
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 2984 - 2990
  • [40] Deep reinforcement learning using least-squares truncated temporal-difference
    Ren, Junkai
    Lan, Yixing
    Xu, Xin
    Zhang, Yichuan
    Fang, Qiang
    Zeng, Yujun
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2024, 9 (02) : 425 - 439