The asymptotic equipartition property in reinforcement learning and its relation to return maximization

被引:6
|
作者
Iwata, K
Ikeda, K
Sakai, H
机构
[1] Hiroshima City Univ, Fac Informat Sci, Asaminami Ku, Hiroshima 7313194, Japan
[2] Kyoto Univ, Grad Sch Informat, Dept Syst Sci, Sakyo Ku, Kyoto 6068501, Japan
关键词
reinforcement learning; Markov decision process; information theory; asymptotic equipartition property; stochastic complexity; return maximization;
D O I
10.1016/j.neunet.2005.02.008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We discuss an important property called the asymptotic equipartition property on empirical sequences in reinforcement learning. This states that the typical set of empirical sequences has probability nearly one, that all elements in the typical set are nearly equi-probable, and that the number of elements in the typical set is an exponential function of the sum of conditional entropies if the number of time steps is sufficiently large. The sum is referred to as stochastic complexity. Using the property we elucidate the fact that the return maximization depends on two factors, the stochastic complexity and a quantity depending on the parameters of environment. Here, the return maximization means that the best sequences in terms of expected return have probability one. We also examine the sensitivity of stochastic complexity, which is a qualitative guide in tuning the parameters of action-selection strategy, and show a sufficient condition for return maximization in probability. (c) 2005 Elsevier Ltd. All rights reserved.
引用
收藏
页码:62 / 75
页数:14
相关论文
共 50 条
  • [31] Immediate return preference emerged from a synaptic learning rule for return maximization
    Yamaguchi, Yoshiya
    Aihara, Takeshi
    Sakai, Yutaka
    NEURAL NETWORKS, 2015, 62 : 83 - 90
  • [32] Strong Laws of Large Numbers and the Asymptotic Equipartition Property for the Asymptotic N-Branch Markov Chains Indexed by a Cayley Tree
    Gao, R.
    Yang, W. G.
    UKRAINIAN MATHEMATICAL JOURNAL, 2017, 69 (07) : 1060 - 1074
  • [33] Strong Laws of Large Numbers and the Asymptotic Equipartition Property for the Asymptotic N-Branch Markov Chains Indexed by a Cayley Tree
    R. Gao
    W. G. Yang
    Ukrainian Mathematical Journal, 2017, 69 : 1060 - 1074
  • [34] DGN: influence maximization based on deep reinforcement learning
    Wang, Jingwen
    Cao, Zhoulin
    Xie, Chunzhi
    Li, Yanli
    Liu, Jia
    Gao, Zhisheng
    JOURNAL OF SUPERCOMPUTING, 2025, 81 (01):
  • [35] Multiagent Reinforcement Learning With Graphical Mutual Information Maximization
    Ding, Shifei
    Du, Wei
    Ding, Ling
    Zhang, Jian
    Guo, Lili
    An, Bo
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, : 1 - 10
  • [36] A Reinforcement Learning Model for Influence Maximization in Social Networks
    Wang, Chao
    Liu, Yiming
    Gao, Xiaofeng
    Chen, Guihai
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2021), PT II, 2021, 12682 : 701 - 709
  • [37] PIANO: Influence Maximization Meets Deep Reinforcement Learning
    Li, Hui
    Xu, Mengting
    Bhowmick, Sourav S.
    Rayhan, Joty Shafiq
    Sun, Changsheng
    Cui, Jiangtao
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2023, 10 (03) : 1288 - 1300
  • [38] On adaptation, maximization, and reinforcement learning among cognitive strategies
    Erev, I
    Barron, G
    PSYCHOLOGICAL REVIEW, 2005, 112 (04) : 912 - 931
  • [39] Influence Maximization in Dynamic Networks Using Reinforcement Learning
    Dizaji S.H.S.
    Patil K.
    Avrachenkov K.
    SN Computer Science, 5 (1)
  • [40] Complex Contagion Influence Maximization: A Reinforcement Learning Approach
    Chen, Haipeng
    Wilder, Bryan
    Qiu, Wei
    An, Bo
    Rice, Eric
    Tambe, Milind
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 5531 - 5540