Reinforcement learning and dopamine in the striatum: A modeling perspective

被引:1
|
作者
Wanjerkhede, Shesharao M. [1 ]
Bapi, Raju S. [2 ]
Mytri, Vithal D. [3 ]
机构
[1] Guru Nanak Dev Engn Coll, Dept Comp Sci & Engn, Bidar, Karnataka, India
[2] Cent Univ Hyderabad, Dept Comp & Informat Sci, Hyderabad, Andhra Pradesh, India
[3] Guru Nanak Dev Engn Coll, Bidar, Karnataka, India
关键词
Actor-critic; Basal ganglia; Dopamine; LTP; LTD; PROTEIN-KINASE-II; BASAL GANGLIA; FRONTAL-CORTEX; DARPP-32; PHOSPHORYLATION; COINCIDENT ACTIVATION; COMPUTATIONAL MODELS; SYNAPTIC PLASTICITY; PREDICTION ERROR; NMDA RECEPTORS; WORKING-MEMORY;
D O I
10.1016/j.neucom.2013.02.061
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The recent research evidences show that the dopamine (DA) system in the brain is involved in various functions like reward-related learning, exploration, preparation, and execution in goal directed behavior. It is suggested that dopaminergic neurons provide a prediction error akin to the error computed in the temporal difference learning (TDL) models of reinforcement learning (RL). Houk et al. (1995) [26] proposed a biochemical model in the spine head of neurons at the striatum in the basal ganglia which generates and uses neural signals to predict reinforcement. The model explains how the DA neurons are able to predict reinforcement and how the output from these neurons might then be used to reinforce the behaviors that lead to primary reinforcement. They proposed a scheme drawing that parallels between actor-critic architecture and dopamine activity in the basal ganglia. Houk et al. (1995) [26] also proposed a biochemical model of interactions between protein molecules which supports learning earlier predictions of reinforcement in the spine head of medium spiny neurons at the striatum. However, Houk's proposed cellular model fails to account for the time delay between the dopaminergic and glutamatergic activity required for reward-related learning and also fails to explain the 'eligibility trace' condition needed in delayed tasks of associative conditioning in which a memory trace of the antecedent signal is needed at the time of a succeeding reward. In this article, we review various models of RL with an emphasis on the cellular models of RL. In particular, we emphasize biochemical models of RL, and point out the future directions. (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:27 / 40
页数:14
相关论文
共 50 条
  • [31] Reversal Learning and Dopamine: A Bayesian Perspective
    Costa, Vincent D.
    Tran, Valery L.
    Turchi, Janita
    Averbeck, Bruno B.
    JOURNAL OF NEUROSCIENCE, 2015, 35 (06): : 2407 - 2416
  • [32] Rediscovering Afordance: A Reinforcement Learning Perspective
    Liao, Yi-Chi
    Todi, Kashyap
    Acharya, Aditya
    Keurulainen, Antti
    Howes, Andrew
    Oulasvirta, Antti
    PROCEEDINGS OF THE 2022 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI' 22), 2022,
  • [33] A DATASET PERSPECTIVE ON OFFLINE REINFORCEMENT LEARNING
    Schweighofer, Kajetan
    Radler, Andreas
    Dinu, Marius-Constantin
    Hofmarcher, Markus
    Patil, Vihang
    Bitto-Nemling, Angela
    Eghbal-zadeh, Hamid
    Hochreiter, Sepp
    CONFERENCE ON LIFELONG LEARNING AGENTS, VOL 199, 2022, 199
  • [34] Modeling of the presymptomatic stage of parkinsonism in mice: Analysis of dopamine release in the striatum
    Ukraintseva, Yu. S.
    Shchegolevskii, N. V.
    Korshunov, V. A.
    Kucheryanu, V. G.
    Ugryumov, M. V.
    Bazyan, A. S.
    NEUROCHEMICAL JOURNAL, 2010, 4 (02) : 142 - 147
  • [35] Modeling of the presymptomatic stage of parkinsonism in mice: Analysis of dopamine release in the striatum
    Yu. S. Ukraintseva
    N. V. Shchegolevskii
    V. A. Korshunov
    V. G. Kucheryanu
    M. V. Ugryumov
    A. S. Bazya
    Neurochemical Journal, 2010, 4 : 142 - 147
  • [36] Dopamine signaling in the striatum
    Valjent, Emmanuel
    Biever, Anne
    Gangarossa, Giuseppe
    Puighermanal, Emma
    INTRACELLULAR SIGNALLING PROTEINS, 2019, 116 : 375 - 396
  • [37] Learning classifier systems from a reinforcement learning perspective
    P. L. Lanzi
    Soft Computing, 2002, 6 (3) : 162 - 170
  • [38] Amygdala and Ventral Striatum Population Codes Implement Multiple Learning Rates for Reinforcement Learning
    Averbeck, Bruno B.
    2017 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2017, : 3448 - 3452
  • [39] CHEMOGENETIC INHIBITION OF AMYGDALA INPUTS TO STRIATUM MODULATES REINFORCEMENT LEARNING IN MACAQUES
    Costa, Vincent
    Rothenhoefer, Kathryn
    Stocker, McKenna
    NEUROPSYCHOPHARMACOLOGY, 2024, 49 : 333 - 334
  • [40] The role of dopamine in the temporal difference model of reinforcement learning
    Montague, R
    NEUROPSYCHOPHARMACOLOGY, 2005, 30 : S27 - S27