Approximate dynamic programming via direct search in the space of value function approximations

被引:10
|
作者
Arruda, E. F. [1 ]
Fragoso, M. D. [2 ]
do Val, J. B. R. [3 ]
机构
[1] FENG PUCRS, BR-90619900 Porto Alegre, RS, Brazil
[2] CSC LNCC, BR-25651075 Petropolis, RJ, Brazil
[3] DT FEEC UNICAMP, BR-13083852 Campinas, SP, Brazil
关键词
Dynamic programming; Markov decision processes; Convex optimization; Direct search methods; CONVERGENCE;
D O I
10.1016/j.ejor.2010.11.019
中图分类号
C93 [管理学];
学科分类号
12 ; 1201 ; 1202 ; 120202 ;
摘要
This paper deals with approximate value iteration (AVI) algorithms applied to discounted dynamic programming (DP) problems. For a fixed control policy, the span semi-norm of the so-called Bellman residual is shown to be convex in the Banach space of candidate solutions to the DP problem. This fact motivates the introduction of an AVI algorithm with local search that seeks to minimize the span semi-norm of the Bellman residual in a convex value function approximation space. The novelty here is that the optimality of a point in the approximation architecture is characterized by means of convex optimization concepts and necessary and sufficient conditions to local optimality are derived. The procedure employs the classical AVI algorithm direction (Bellman residual) combined with a set of independent search directions, to improve the convergence rate. It has guaranteed convergence and satisfies, at least, the necessary optimality conditions over a prescribed set of directions. To illustrate the method, examples are presented that deal with a class of problems from the literature and a large state space queueing problem setting. (C) 2010 Elsevier B.V. All rights reserved.
引用
收藏
页码:343 / 351
页数:9
相关论文
共 50 条
  • [1] Approximate dynamic programming via linear programming
    de Farias, DP
    Van Roy, B
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 14, VOLS 1 AND 2, 2002, 14 : 689 - 695
  • [2] Neural network and regression spline value function approximations for stochastic dynamic programming
    Cervellera, Cristiano
    Wen, Aihong
    Chen, Victoria C. P.
    COMPUTERS & OPERATIONS RESEARCH, 2007, 34 (01) : 70 - 90
  • [3] Approximate Dynamic Programming via Sum of Squares Programming
    Summers, Tyler H.
    Kunz, Konstantin
    Kariotoglou, Nikolaos
    Kamgarpour, Maryam
    Summers, Sean
    Lygeros, John
    2013 EUROPEAN CONTROL CONFERENCE (ECC), 2013, : 191 - 197
  • [4] Empirical Value Iteration for Approximate Dynamic Programming
    Haskell, William B.
    Jain, Rahul
    Kalathil, Dileep
    2014 AMERICAN CONTROL CONFERENCE (ACC), 2014, : 495 - 500
  • [5] ε-Value Function and Dynamic Programming
    A. Nowakowski
    Journal of Optimization Theory and Applications, 2008, 138 : 85 - 93
  • [6] ε-value function and dynamic programming
    Nowakowski, A.
    JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2008, 138 (01) : 85 - 93
  • [7] Accelerating critic learning in approximate dynamic programming via value templates and perceptual learning
    Shannon, TT
    Santiago, RA
    Lendaris, GG
    PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS 2003, VOLS 1-4, 2003, : 2922 - 2927
  • [8] Markdown Optimization via Approximate Dynamic Programming
    Cosgun, Ozlem
    Kula, Ufuk
    Kahraman, Cengiz
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2013, 6 (01) : 64 - 78
  • [9] Approximate Dynamic Programming via Penalty Functions
    Beuchat, Paul N.
    Lygeros, John
    IFAC PAPERSONLINE, 2017, 50 (01): : 11814 - 11821
  • [10] Markdown Optimization via Approximate Dynamic Programming
    Özlem Coşgun
    Ufuk Kula
    Cengiz Kahraman
    International Journal of Computational Intelligence Systems, 2013, 6 : 64 - 78