Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming

被引：4

作者：

Bertsekas, Dimitri P. ^{[1
]}

Yu, Huizhen ^{[2
]}

机构：

[1] MIT, Dept Elect Engn & Comp Sci, Cambridge, MA 02139 USA

[2] Univ Helsinki, Dept Comp Sci, Helsinki, Finland

来源：

49TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC) | 2010年

基金：

芬兰科学院;

关键词：

STOCHASTIC-APPROXIMATION; ALGORITHMS;

D O I：

10.1109/CDC.2010.5717930

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We consider the classical finite-state discounted Markovian decision problem, and we introduce a new policy iteration-like algorithm for finding the optimal Q-factors. Instead of policy evaluation by solving a linear system of equations, our algorithm involves (possibly inexact) solution of an optimal stopping problem. This problem can be solved with simple Q-learning iterations, in the case where a lookup table representation is used; it can also be solved with the Q-learning algorithm of Tsitsiklis and Van Roy [TsV99], in the case where feature-based Q-factor approximations are used. In exact/lookup table representation form, our algorithm admits asynchronous and stochastic iterative implementations, in the spirit of asynchronous/modified policy iteration, with lower overhead advantages over existing Q-learning schemes. Furthermore, for large-scale problems, where linear basis function approximations and simulation-based temporal difference implementations are used, our algorithm resolves effectively the inherent difficulties of existing schemes due to inadequate exploration.

引用

页码：1409 / 1416

页数：8

共 50 条

[11] Non-delusional Q-learning and Value Iteration
Lu, Tyler
Schuurmans, Dale
Boutilier, Craig
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[12] A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems
Wei QingLai
Liu DeRong
SCIENCE CHINA-INFORMATION SCIENCES, 2015, 58 (12) : 1 - 15
[13] Policy iteration based Q-learning for linear nonzero-sum quadratic differential games
Xinxing Li
Zhihong Peng
Li Liang
Wenzhong Zha
Science China Information Sciences, 2019, 62
[14] A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems
WEI QingLai
LIU DeRong
ScienceChina(InformationSciences), 2015, 58 (12) : 147 - 161
[15] Policy iteration based Q-learning for linear nonzero-sum quadratic differential games
Li, Xinxing
Peng, Zhihong
Liang, Li
Zha, Wenzhong
SCIENCE CHINA-INFORMATION SCIENCES, 2019, 62 (05)
[16] Multiresolution State-Space Discretization Method for Q-Learning with Function Approximation and Policy Iteration
Lampton, Amanda
Valasek, John
2009 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2009), VOLS 1-9, 2009, : 2677 - 2682
[17] Policy iteration based Q-learning for linear nonzero-sum quadratic differential games
Xinxing LI
Zhihong PENG
Li LIANG
Wenzhong ZHA
ScienceChina(InformationSciences), 2019, 62 (05) : 195 - 213
[18] Enhanced Q-Learning Algorithm for Dynamic Power Management with Performance Constraint
Liu, Wei
Tan, Ying
Qiu, Qinru
2010 DESIGN, AUTOMATION & TEST IN EUROPE (DATE 2010), 2010, : 602 - 605
[19] Stochastic Primal-Dual Q-Learning Algorithm For Discounted MDPs
Lee, Donghwan
He, Niao
2019 AMERICAN CONTROL CONFERENCE (ACC), 2019, : 4897 - 4902
[20] Empirical Policy Iteration for Approximate Dynamic Programming
Haskell, William B.
Jain, Rahul
Kalathil, Dileep
2014 IEEE 53RD ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2014, : 6573 - 6578

← 1 2 3 4 5 →