Loss Dynamics of Temporal Difference Reinforcement Learning

被引：0

作者：

Bordelon, Blake ^{[1
]}

Masset, Paul ^{[1
]}

Kuo, Henry ^{[1
]}

Pehlevan, Cengiz ^{[1
]}

机构：

[1] Harvard Univ, John Paulson Sch Engn & Appl Sci, Ctr Brain Sci, Kempner Inst Study Nat & Artificial Intelligence, Cambridge, MA 02138 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

关键词：

STATISTICAL-MECHANICS; CONVERGENCE; HIPPOCAMPUS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Reinforcement learning has been successful across several applications in which agents have to learn to act in environments with sparse feedback. However, despite this empirical success there is still a lack of theoretical understanding of how the parameters of reinforcement learning models and the features used to represent states interact to control the dynamics of learning. In this work, we use concepts from statistical physics, to study the typical case learning curves for temporal difference learning of a value function with linear function approximators. Our theory is derived under a Gaussian equivalence hypothesis where averages over the random trajectories are replaced with temporally correlated Gaussian feature averages and we validate our assumptions on small scale Markov Decision Processes. We find that the stochastic semi-gradient noise due to subsampling the space of possible episodes leads to significant plateaus in the value error, unlike in traditional gradient descent dynamics. We study how learning dynamics and plateaus depend on feature structure, learning rate, discount factor, and reward function. We then analyze how strategies like learning rate annealing and reward shaping can favorably alter learning dynamics and plateaus. To conclude, our work introduces new tools to open a new direction towards developing a theory of learning dynamics in reinforcement learning.

引用

页数：28

共 50 条

[31] Reinforcement temporal difference learning scheme for dynamic energy management in embedded systems
Viswanathan, LP
Monie, EC
19TH INTERNATIONAL CONFERENCE ON VLSI DESIGN, PROCEEDINGS, 2005, : 645 - 650
[32] Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning
Shimon Whiteson
Matthew E. Taylor
Peter Stone
Autonomous Agents and Multi-Agent Systems, 2010, 21 : 1 - 35
[33] The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning
Tang, Yunhao
Rowland, Mark
Munos, Remi
Pires, Bernardo Avila
Dabney, Will
Bellemare, Marc G.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[34] Toward Robots' Behavioral Transparency of Temporal Difference Reinforcement Learning With a Human Teacher
Matarese, Marco
Sciutti, Alessandra
Rea, Francesco
Rossi, Silvia
IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, 2021, 51 (06) : 578 - 589
[35] Reinforcement learning: computing the temporal difference of values via distinct corticostriatal pathways
Morita, Kenji
Morishima, Mieko
Sakai, Katsuyuki
Kawaguchi, Yasuo
TRENDS IN NEUROSCIENCES, 2012, 35 (08) : 457 - 467
[36] Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning
Whiteson, Shimon
Taylor, Matthew E.
Stone, Peter
AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2010, 21 (01) : 1 - 35
[37] An automated signalized junction controller that learns strategies by temporal difference reinforcement learning
Box, Simon
Waterson, Ben
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2013, 26 (01) : 652 - 659
[38] DynaSTI: Dynamics modeling with sequential temporal information for reinforcement learning in Atari
Kim, Jaehoon
Lee, Young Jae
Kwak, Mingu
Park, Young Joon
Kim, Seoung Bum
KNOWLEDGE-BASED SYSTEMS, 2024, 299
[39] A Unified Approach for Multi-step Temporal-Difference Learning with Eligibility Traces in Reinforcement Learning
Yang, Long
Shi, Minhao
Zheng, Qian
Meng, Wenjia
Pan, Gang
PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 2984 - 2990
[40] Deep reinforcement learning using least-squares truncated temporal-difference
Ren, Junkai
Lan, Yixing
Xu, Xin
Zhang, Yichuan
Fang, Qiang
Zeng, Yujun
CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2024, 9 (02) : 425 - 439

← 1 2 3 4 5 →