A generalization error for Q-learning

被引:0
|
作者
Murphy, Susan A. [1 ]
机构
[1] Department of Statistics, University of Michigan, Ann Arbor, MI 48109-1107, United States
关键词
Algorithms - Approximation theory - Data reduction - Dynamic programming - Error analysis - Problem solving;
D O I
暂无
中图分类号
学科分类号
摘要
Planning problems that involve learning a policy from a single training set of finite horizon trajectories arise in both social science and medical fields. We consider Q-learning with function approximation for this setting and derive an upper bound on the generalization error. This upper bound is in terms of quantities minimized by a Q-learning algorithm, the complexity of the approximation space and an approximation term due to the mismatch between Q-learning and the goal of learning a policy that maximizes the value function.
引用
收藏
相关论文
共 50 条
  • [41] Double Gumbel Q-Learning
    Hui, David Yu-Tung
    Courville, Aaron
    Bacon, Pierre-Luc
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [42] Interactive Q-Learning for Quantiles
    Linn, Kristin A.
    Laber, Eric B.
    Stefanski, Leonard A.
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2017, 112 (518) : 638 - 649
  • [43] Q-Learning: Theory and Applications
    Clifton, Jesse
    Laber, Eric
    ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION, VOL 7, 2020, 2020, 7 : 279 - 301
  • [44] Two mode Q-learning
    Park, KH
    Kim, JH
    CEC: 2003 CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-4, PROCEEDINGS, 2003, : 2449 - 2454
  • [45] Q-learning with Nearest Neighbors
    Shah, Devavrat
    Xie, Qiaomin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [46] Underestimation estimators to Q-learning
    Abliz, Patigul
    Ying, Shi
    INFORMATION SCIENCES, 2022, 607 : 173 - 185
  • [47] Glide and Zap Q-Learning
    He, Xiaofan
    Jin, Richeng
    Dai, Huaiyu
    IEEE INFOCOM 2020 - IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (INFOCOM WKSHPS), 2020, : 1147 - 1152
  • [48] Q-LEARNING WITH CENSORED DATA
    Goldberg, Yair
    Kosorok, Michael R.
    ANNALS OF STATISTICS, 2012, 40 (01): : 529 - 560
  • [49] q-Learning in Continuous Time
    Jia, Yanwei
    Zhou, Xun Yu
    JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
  • [50] Distributionally Robust Q-Learning
    Liu, Zijian
    Bai, Qinxun
    Blanchet, Jose
    Dong, Perry
    Xu, Wei
    Zhou, Zhengqing
    Zhou, Zhengyuan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,