Loss Dynamics of Temporal Difference Reinforcement Learning

被引：0

作者：

Bordelon, Blake ^{[1
]}

Masset, Paul ^{[1
]}

Kuo, Henry ^{[1
]}

Pehlevan, Cengiz ^{[1
]}

机构：

[1] Harvard Univ, John Paulson Sch Engn & Appl Sci, Ctr Brain Sci, Kempner Inst Study Nat & Artificial Intelligence, Cambridge, MA 02138 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

关键词：

STATISTICAL-MECHANICS; CONVERGENCE; HIPPOCAMPUS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Reinforcement learning has been successful across several applications in which agents have to learn to act in environments with sparse feedback. However, despite this empirical success there is still a lack of theoretical understanding of how the parameters of reinforcement learning models and the features used to represent states interact to control the dynamics of learning. In this work, we use concepts from statistical physics, to study the typical case learning curves for temporal difference learning of a value function with linear function approximators. Our theory is derived under a Gaussian equivalence hypothesis where averages over the random trajectories are replaced with temporally correlated Gaussian feature averages and we validate our assumptions on small scale Markov Decision Processes. We find that the stochastic semi-gradient noise due to subsampling the space of possible episodes leads to significant plateaus in the value error, unlike in traditional gradient descent dynamics. We study how learning dynamics and plateaus depend on feature structure, learning rate, discount factor, and reward function. We then analyze how strategies like learning rate annealing and reward shaping can favorably alter learning dynamics and plateaus. To conclude, our work introduces new tools to open a new direction towards developing a theory of learning dynamics in reinforcement learning.

引用

页数：28

共 50 条

[1] Temporal difference coding in reinforcement learning
Iwata, K
Ikeda, K
INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING, 2003, 2690 : 218 - 227
[2] Dynamics of Temporal Difference Learning
Wendemuth, Andreas
20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 1107 - 1112
[3] Off-Policy Reinforcement Learning with Loss Function Weighted by Temporal Difference Error
Park, Bumgeun
Kim, Taeyoung
Moon, Woohyeon
Nengroo, Sarvar Hussain
Har, Dongsoo
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT V, 2023, 14090 : 600 - 613
[4] STOCHASTIC KERNEL TEMPORAL DIFFERENCE FOR REINFORCEMENT LEARNING
Bae, Jihye
Giraldo, Luis Sanchez
Chhatbar, Pratik
Francis, Joseph
Sanchez, Justin
Principe, Jose
2011 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2011,
[5] Reinforcement Learning via Kernel Temporal Difference
Bae, Jihye
Chhatbar, Pratik
Francis, Joseph T.
Sanchez, Justin C.
Principe, Jose C.
2011 ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2011, : 5662 - 5665
[6] Eigensubspace of Temporal-Difference Dynamics and How It Improves Value Approximation in Reinforcement Learning
He, Qiang
Zhou, Tianyi
Fang, Meng
Maghsudi, Setareh
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, ECML PKDD 2023, PT IV, 2023, 14172 : 573 - 589
[7] Monte Carlo and Temporal Difference Methods in Reinforcement Learning
Han, Isaac
Oh, Seungwon
Jung, Hoyoun
Chung, Insik
Kim, Kyung-Joong
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE, 2023, 18 (04) : 64 - 65
[8] The role of dopamine in the temporal difference model of reinforcement learning
Montague, R
NEUROPSYCHOPHARMACOLOGY, 2005, 30 : S27 - S27
[9] A Reinforcement Learning Model Based on Temporal Difference Algorithm
Li, Xiali
Lv, Zhengyu
Wang, Song
Wei, Zhi
Wu, Licheng
IEEE ACCESS, 2019, 7 : 121922 - 121930
[10] Basis function adaptation in temporal difference reinforcement learning
Menache, I
Mannor, S
Shimkin, N
ANNALS OF OPERATIONS RESEARCH, 2005, 134 (01) : 215 - 238

← 1 2 3 4 5 →