The asymptotic equipartition property in reinforcement learning and its relation to return maximization

被引：6

作者：

Iwata, K

Ikeda, K

Sakai, H

机构：

[1] Hiroshima City Univ, Fac Informat Sci, Asaminami Ku, Hiroshima 7313194, Japan

[2] Kyoto Univ, Grad Sch Informat, Dept Syst Sci, Sakyo Ku, Kyoto 6068501, Japan

来源：

NEURAL NETWORKS | 2006年 / 19卷 / 01期

关键词：

reinforcement learning; Markov decision process; information theory; asymptotic equipartition property; stochastic complexity; return maximization;

D O I：

10.1016/j.neunet.2005.02.008

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We discuss an important property called the asymptotic equipartition property on empirical sequences in reinforcement learning. This states that the typical set of empirical sequences has probability nearly one, that all elements in the typical set are nearly equi-probable, and that the number of elements in the typical set is an exponential function of the sum of conditional entropies if the number of time steps is sufficiently large. The sum is referred to as stochastic complexity. Using the property we elucidate the fact that the return maximization depends on two factors, the stochastic complexity and a quantity depending on the parameters of environment. Here, the return maximization means that the best sequences in terms of expected return have probability one. We also examine the sensitivity of stochastic complexity, which is a qualitative guide in tuning the parameters of action-selection strategy, and show a sufficient condition for return maximization in probability. (c) 2005 Elsevier Ltd. All rights reserved.

引用

页码：62 / 75

页数：14

共 50 条

[1] Asymptotic equipartition property on empirical sequence in reinforcement learning
Iwata, K
Ikeda, K
Sakai, H
Proceedings of the Second IASTED International Conference on Neural Networks and Computational Intelligence, 2004, : 90 - 95
[2] Stochastic processes for return maximization in reinforcement learning
Iwata, K
Sakai, H
Ikeda, K
ARTIFICIAL NEURAL NETWORKS: FORMAL MODELS AND THEIR APPLICATIONS - ICANN 2005, PT 2, PROCEEDINGS, 2005, 3697 : 209 - 214
[3] Reconstruction sequences and equipartition measures: An examination of the asymptotic equipartition property
Lewis, JT
Pfister, CE
Russell, RP
Sullivan, WG
IEEE TRANSACTIONS ON INFORMATION THEORY, 1997, 43 (06) : 1935 - 1947
[4] A Fully Quantum Asymptotic Equipartition Property
Tomamichel, Marco
Colbeck, Roger
Renner, Renato
IEEE TRANSACTIONS ON INFORMATION THEORY, 2009, 55 (12) : 5840 - 5847
[5] THE ASYMPTOTIC EQUIPARTITION PROPERTY FOR ASYMPTOTIC CIRCULAR MARKOV CHAINS
Zhong, Pingping
Yang, Weiguo
Liang, Peipei
PROBABILITY IN THE ENGINEERING AND INFORMATIONAL SCIENCES, 2010, 24 (02) : 279 - 288
[6] AN ASYMPTOTIC EQUIPARTITION PROPERTY FOR MEASURES ON MODEL SPACES
Austin, Tim
TRANSACTIONS OF THE AMERICAN MATHEMATICAL SOCIETY, 2019, 371 (02) : 1379 - 1402
[7] CHARACTERIZATION OF SOURCES POSSESSING PROPERTY OF ASYMPTOTIC EQUIPARTITION
DIES, JE
COMPTES RENDUS HEBDOMADAIRES DES SEANCES DE L ACADEMIE DES SCIENCES SERIE A, 1973, 276 (25): : 1623 - 1626
[8] An information-theoretic analysis of return maximization in reinforcement learning
Iwata, Kazunori
NEURAL NETWORKS, 2011, 24 (10) : 1074 - 1081
[9] An Information-Spectrum Approach to Analysis of Return Maximization in Reinforcement Learning
Iwata, Kazunori
NEURAL INFORMATION PROCESSING: THEORY AND ALGORITHMS, PT I, 2010, 6443 : 478 - 485
[10] The role of the asymptotic equipartition property in noiseless source coding
Verdu, S
Han, TS
IEEE TRANSACTIONS ON INFORMATION THEORY, 1997, 43 (03) : 847 - 857

← 1 2 3 4 5 →