Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming

被引:4
|
作者
Bertsekas, Dimitri P. [1 ]
Yu, Huizhen [2 ]
机构
[1] MIT, Dept Elect Engn & Comp Sci, Cambridge, MA 02139 USA
[2] Univ Helsinki, Dept Comp Sci, Helsinki, Finland
基金
芬兰科学院;
关键词
STOCHASTIC-APPROXIMATION; ALGORITHMS;
D O I
10.1109/CDC.2010.5717930
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We consider the classical finite-state discounted Markovian decision problem, and we introduce a new policy iteration-like algorithm for finding the optimal Q-factors. Instead of policy evaluation by solving a linear system of equations, our algorithm involves (possibly inexact) solution of an optimal stopping problem. This problem can be solved with simple Q-learning iterations, in the case where a lookup table representation is used; it can also be solved with the Q-learning algorithm of Tsitsiklis and Van Roy [TsV99], in the case where feature-based Q-factor approximations are used. In exact/lookup table representation form, our algorithm admits asynchronous and stochastic iterative implementations, in the spirit of asynchronous/modified policy iteration, with lower overhead advantages over existing Q-learning schemes. Furthermore, for large-scale problems, where linear basis function approximations and simulation-based temporal difference implementations are used, our algorithm resolves effectively the inherent difficulties of existing schemes due to inadequate exploration.
引用
收藏
页码:1409 / 1416
页数:8
相关论文
共 50 条
  • [11] Non-delusional Q-learning and Value Iteration
    Lu, Tyler
    Schuurmans, Dale
    Boutilier, Craig
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [12] A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems
    Wei QingLai
    Liu DeRong
    SCIENCE CHINA-INFORMATION SCIENCES, 2015, 58 (12) : 1 - 15
  • [13] Policy iteration based Q-learning for linear nonzero-sum quadratic differential games
    Xinxing Li
    Zhihong Peng
    Li Liang
    Wenzhong Zha
    Science China Information Sciences, 2019, 62
  • [14] A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems
    WEI QingLai
    LIU DeRong
    ScienceChina(InformationSciences), 2015, 58 (12) : 147 - 161
  • [15] Policy iteration based Q-learning for linear nonzero-sum quadratic differential games
    Li, Xinxing
    Peng, Zhihong
    Liang, Li
    Zha, Wenzhong
    SCIENCE CHINA-INFORMATION SCIENCES, 2019, 62 (05)
  • [16] Multiresolution State-Space Discretization Method for Q-Learning with Function Approximation and Policy Iteration
    Lampton, Amanda
    Valasek, John
    2009 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2009), VOLS 1-9, 2009, : 2677 - 2682
  • [17] Policy iteration based Q-learning for linear nonzero-sum quadratic differential games
    Xinxing LI
    Zhihong PENG
    Li LIANG
    Wenzhong ZHA
    ScienceChina(InformationSciences), 2019, 62 (05) : 195 - 213
  • [18] Enhanced Q-Learning Algorithm for Dynamic Power Management with Performance Constraint
    Liu, Wei
    Tan, Ying
    Qiu, Qinru
    2010 DESIGN, AUTOMATION & TEST IN EUROPE (DATE 2010), 2010, : 602 - 605
  • [19] Stochastic Primal-Dual Q-Learning Algorithm For Discounted MDPs
    Lee, Donghwan
    He, Niao
    2019 AMERICAN CONTROL CONFERENCE (ACC), 2019, : 4897 - 4902
  • [20] Empirical Policy Iteration for Approximate Dynamic Programming
    Haskell, William B.
    Jain, Rahul
    Kalathil, Dileep
    2014 IEEE 53RD ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2014, : 6573 - 6578