Variable Resolution Discretization in Optimal Control

被引:2
作者
Rémi Munos
Andrew Moore
机构
[1] Ecole Polytechnique,Centre de Mathématiques Appliquées
[2] Carnegie Mellon University,Robotics Institute
来源
Machine Learning | 2002年 / 49卷
关键词
optimal control; reinforcement learning; variable resolution discretization; adaptive mesh refinement;
D O I
暂无
中图分类号
学科分类号
摘要
The problem of state abstraction is of central importance in optimal control, reinforcement learning and Markov decision processes. This paper studies the case of variable resolution state abstraction for continuous time and space, deterministic dynamic control problems in which near-optimal policies are required. We begin by defining a class of variable resolution policy and value function representations based on Kuhn triangulations embedded in a kd-trie. We then consider top-down approaches to choosing which cells to split in order to generate improved policies. The core of this paper is the introduction and evaluation of a wide variety of possible splitting criteria. We begin with local approaches based on value function and policy properties that use only features of individual cells in making split choices. Later, by introducing two new non-local measures, influence and variance, we derive splitting criteria that allow one cell to efficiently take into account its impact on other cells when deciding whether to split. Influence is an efficiently-calculable measure of the extent to which changes in some state effect the value function of some other states. Variance is an efficiently-calculable measure of how risky is some state in a Markov chain: a low variance state is one in which we would be very surprised if, during any one execution, the long-term reward attained from that state differed substantially from its expected value, given by the value function.
引用
收藏
页码:291 / 323
页数:32
相关论文
共 17 条
[1]  
Barles G.(1991)Convergence of approximation schemes for fully nonlinear second order equations Asymptotic Analysis 4 271-283
[2]  
Souganidis P.(1995)Learning to act using real-time dynamic programming Artificial Intelligence 72 81-138
[3]  
Barto A. G.(1983)Neuronlike adaptive elements that that can learn difficult control problems IEEE Trans. in Systems Man and Cybernetics 13 835-846
[4]  
Bradtke S. J.(1977)An algorithm for finding best matches in logarithmic expected time ACM Transactions on Mathematical Software 3 209-226
[5]  
Singh S. P.(2000)A study of reinforcement learning in the continuous case by the means of viscosity solutions Machine Learning 40 265-299
[6]  
Barto A. G.(1982)A self-learning automaton with variable resolution for high precision assembly by industrial robots IEEE Trans. on Automatic Control 27 1109-1113
[7]  
Sutton R. S.(1995)Temporal difference learning and td-gammon Communication of the ACM 38 58-68
[8]  
Anderson C. W.(undefined)undefined undefined undefined undefined-undefined
[9]  
Friedman J. H.(undefined)undefined undefined undefined undefined-undefined
[10]  
Bentley J. L.(undefined)undefined undefined undefined undefined-undefined