Approximate dynamic programming via direct search in the space of value function approximations

被引：10

作者：

Arruda, E. F. ^{[1
]}

Fragoso, M. D. ^{[2
]}

do Val, J. B. R. ^{[3
]}

机构：

[1] FENG PUCRS, BR-90619900 Porto Alegre, RS, Brazil

[2] CSC LNCC, BR-25651075 Petropolis, RJ, Brazil

[3] DT FEEC UNICAMP, BR-13083852 Campinas, SP, Brazil

来源：

EUROPEAN JOURNAL OF OPERATIONAL RESEARCH | 2011年 / 211卷 / 02期

关键词：

Dynamic programming; Markov decision processes; Convex optimization; Direct search methods; CONVERGENCE;

D O I：

10.1016/j.ejor.2010.11.019

中图分类号：

C93 [管理学];

学科分类号：

12 ; 1201 ; 1202 ; 120202 ;

摘要：

This paper deals with approximate value iteration (AVI) algorithms applied to discounted dynamic programming (DP) problems. For a fixed control policy, the span semi-norm of the so-called Bellman residual is shown to be convex in the Banach space of candidate solutions to the DP problem. This fact motivates the introduction of an AVI algorithm with local search that seeks to minimize the span semi-norm of the Bellman residual in a convex value function approximation space. The novelty here is that the optimality of a point in the approximation architecture is characterized by means of convex optimization concepts and necessary and sufficient conditions to local optimality are derived. The procedure employs the classical AVI algorithm direction (Bellman residual) combined with a set of independent search directions, to improve the convergence rate. It has guaranteed convergence and satisfies, at least, the necessary optimality conditions over a prescribed set of directions. To illustrate the method, examples are presented that deal with a class of problems from the literature and a large state space queueing problem setting. (C) 2010 Elsevier B.V. All rights reserved.

引用

页码：343 / 351

页数：9

共 50 条

[1] Approximate dynamic programming via linear programming
de Farias, DP
Van Roy, B
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 14, VOLS 1 AND 2, 2002, 14 : 689 - 695
[2] Neural network and regression spline value function approximations for stochastic dynamic programming
Cervellera, Cristiano
Wen, Aihong
Chen, Victoria C. P.
COMPUTERS & OPERATIONS RESEARCH, 2007, 34 (01) : 70 - 90
[3] Approximate Dynamic Programming via Sum of Squares Programming
Summers, Tyler H.
Kunz, Konstantin
Kariotoglou, Nikolaos
Kamgarpour, Maryam
Summers, Sean
Lygeros, John
2013 EUROPEAN CONTROL CONFERENCE (ECC), 2013, : 191 - 197
[4] Empirical Value Iteration for Approximate Dynamic Programming
Haskell, William B.
Jain, Rahul
Kalathil, Dileep
2014 AMERICAN CONTROL CONFERENCE (ACC), 2014, : 495 - 500
[5] ε-Value Function and Dynamic Programming
A. Nowakowski
Journal of Optimization Theory and Applications, 2008, 138 : 85 - 93
[6] ε-value function and dynamic programming
Nowakowski, A.
JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2008, 138 (01) : 85 - 93
[7] Accelerating critic learning in approximate dynamic programming via value templates and perceptual learning
Shannon, TT
Santiago, RA
Lendaris, GG
PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS 2003, VOLS 1-4, 2003, : 2922 - 2927
[8] Markdown Optimization via Approximate Dynamic Programming
Cosgun, Ozlem
Kula, Ufuk
Kahraman, Cengiz
INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2013, 6 (01) : 64 - 78
[9] Approximate Dynamic Programming via Penalty Functions
Beuchat, Paul N.
Lygeros, John
IFAC PAPERSONLINE, 2017, 50 (01): : 11814 - 11821
[10] Markdown Optimization via Approximate Dynamic Programming
Özlem Coşgun
Ufuk Kula
Cengiz Kahraman
International Journal of Computational Intelligence Systems, 2013, 6 : 64 - 78

← 1 2 3 4 5 →