A generalization error for Q-learning

被引:0
|
作者
Murphy, Susan A. [1 ]
机构
[1] Department of Statistics, University of Michigan, Ann Arbor, MI 48109-1107, United States
关键词
Algorithms - Approximation theory - Data reduction - Dynamic programming - Error analysis - Problem solving;
D O I
暂无
中图分类号
学科分类号
摘要
Planning problems that involve learning a policy from a single training set of finite horizon trajectories arise in both social science and medical fields. We consider Q-learning with function approximation for this setting and derive an upper bound on the generalization error. This upper bound is in terms of quantities minimized by a Q-learning algorithm, the complexity of the approximation space and an approximation term due to the mismatch between Q-learning and the goal of learning a policy that maximizes the value function.
引用
收藏
相关论文
共 50 条
  • [31] Logistic Q-Learning
    Bas-Serrano, Joan
    Curi, Sebastian
    Krause, Andreas
    Neu, Gergely
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [32] Enhancing Nash Q-learning and Team Q-learning mechanisms by using bottlenecks
    Ghazanfari, Behzad
    Mozayani, Nasser
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2014, 26 (06) : 2771 - 2783
  • [33] Selective generalization of CMAC for Q-learning and its application to layout planning of chemical plants
    Hirashinia, Yoichi
    PROCEEDINGS OF 2007 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2007, : 2071 - 2076
  • [34] Comparison of Deep Q-Learning, Q-Learning and SARSA Reinforced Learning for Robot Local Navigation
    Anas, Hafiq
    Ong, Wee Hong
    Malik, Owais Ahmed
    ROBOT INTELLIGENCE TECHNOLOGY AND APPLICATIONS 6, 2022, 429 : 443 - 454
  • [35] Learning to Play Pac-Xon with Q-Learning and Two Double Q-Learning Variants
    Schilperoort, Jits
    Mak, Ivar
    Drugan, Madalina M.
    Wiering, Marco A.
    2018 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI), 2018, : 1151 - 1158
  • [36] Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction
    Kumar, Aviral
    Fu, Justin
    Tucker, George
    Levine, Sergey
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [37] Adaptive Ensemble Q-learning: Minimizing Estimation Bias via Error Feedback
    Wang, Hang
    Lin, Sen
    Zhang, Junshan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [38] An Online Home Energy Management System using Q-Learning and Deep Q-Learning
    Izmitligil, Hasan
    Karamancioglu, Abdurrahman
    SUSTAINABLE COMPUTING-INFORMATICS & SYSTEMS, 2024, 43
  • [39] Q-learning with Logarithmic Regret
    Yang, Kunhe
    Yang, Lin F.
    Du, Simon S.
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [40] Adaptive Bases for Q-learning
    Di Castro, Dotan
    Mannor, Shie
    49TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2010, : 4587 - 4593