The asymptotic equipartition property in reinforcement learning and its relation to return maximization

被引:6
|
作者
Iwata, K
Ikeda, K
Sakai, H
机构
[1] Hiroshima City Univ, Fac Informat Sci, Asaminami Ku, Hiroshima 7313194, Japan
[2] Kyoto Univ, Grad Sch Informat, Dept Syst Sci, Sakyo Ku, Kyoto 6068501, Japan
关键词
reinforcement learning; Markov decision process; information theory; asymptotic equipartition property; stochastic complexity; return maximization;
D O I
10.1016/j.neunet.2005.02.008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We discuss an important property called the asymptotic equipartition property on empirical sequences in reinforcement learning. This states that the typical set of empirical sequences has probability nearly one, that all elements in the typical set are nearly equi-probable, and that the number of elements in the typical set is an exponential function of the sum of conditional entropies if the number of time steps is sufficiently large. The sum is referred to as stochastic complexity. Using the property we elucidate the fact that the return maximization depends on two factors, the stochastic complexity and a quantity depending on the parameters of environment. Here, the return maximization means that the best sequences in terms of expected return have probability one. We also examine the sensitivity of stochastic complexity, which is a qualitative guide in tuning the parameters of action-selection strategy, and show a sufficient condition for return maximization in probability. (c) 2005 Elsevier Ltd. All rights reserved.
引用
收藏
页码:62 / 75
页数:14
相关论文
共 50 条
  • [41] Leveraging transfer learning in reinforcement learning to tackle competitive influence maximization
    Ali, Khurshed
    Wang, Chih-Yu
    Chen, Yi-Shin
    KNOWLEDGE AND INFORMATION SYSTEMS, 2022, 64 (08) : 2059 - 2090
  • [42] Leveraging transfer learning in reinforcement learning to tackle competitive influence maximization
    Khurshed Ali
    Chih-Yu Wang
    Yi-Shin Chen
    Knowledge and Information Systems, 2022, 64 : 2059 - 2090
  • [43] STRONG LAW OF LARGE NUMBERS AND ASYMPTOTIC EQUIPARTITION PROPERTY FOR NONSYMMETRIC MARKOV CHAIN FIELDS ON CAYLEY TREES
    包振华
    叶中行
    Acta Mathematica Scientia, 2007, (04) : 829 - 837
  • [44] Strong law of large numbers and asymptotic equipartition property for nonsymmetric Markov chain fields on Cayley trees
    Bao, Zhenhua
    Ye, Zhongxing
    ACTA MATHEMATICA SCIENTIA, 2007, 27 (04) : 829 - 837
  • [45] Transient and asymptotic dynamics of reinforcement learning in games
    Izquierdo, Luis R.
    Izquierdo, Segismundo S.
    Gotts, Nicholas M.
    Polhill, J. Gary
    GAMES AND ECONOMIC BEHAVIOR, 2007, 61 (02) : 259 - 276
  • [46] INVERSE REINFORCEMENT LEARNING USING EXPECTATION MAXIMIZATION IN MIXTURE MODELS
    Hahn, Juergen
    Zoubir, Abdelhak M.
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 3721 - 3725
  • [47] Throughput Maximization for Ambient Backscatter Communication: A Reinforcement Learning Approach
    Wen, Xiaokang
    Bi, Suzhi
    Lin, Xiaohui
    Yuan, Lina
    Wang, Juan
    PROCEEDINGS OF 2019 IEEE 3RD INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2019), 2019, : 997 - 1003
  • [48] Constrained Expectation-Maximization Methods for Effective Reinforcement Learning
    Chen, Gang
    Peng, Yiming
    Zhang, Mengjie
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018, : 171 - 178
  • [49] Online dynamic influence maximization based on deep reinforcement learning
    Song, Nuan
    Sheng, Wei
    Sun, Yanhao
    Lin, Tianwei
    Wang, Zeyu
    Xu, Zhanxue
    Yang, Fei
    Zhang, Yatao
    Li, Dong
    NEUROCOMPUTING, 2025, 618
  • [50] Influence maximization in hypergraphs based on evolutionary deep reinforcement learning
    Xu, Long
    Ma, Lijia
    Lin, Qiuzhen
    Li, Lingjie
    Gong, Maoguo
    Li, Jianqiang
    INFORMATION SCIENCES, 2025, 698