A generalization error for Q-learning

被引:0
|
作者
Murphy, Susan A. [1 ]
机构
[1] Department of Statistics, University of Michigan, Ann Arbor, MI 48109-1107, United States
关键词
Algorithms - Approximation theory - Data reduction - Dynamic programming - Error analysis - Problem solving;
D O I
暂无
中图分类号
学科分类号
摘要
Planning problems that involve learning a policy from a single training set of finite horizon trajectories arise in both social science and medical fields. We consider Q-learning with function approximation for this setting and derive an upper bound on the generalization error. This upper bound is in terms of quantities minimized by a Q-learning algorithm, the complexity of the approximation space and an approximation term due to the mismatch between Q-learning and the goal of learning a policy that maximizes the value function.
引用
收藏
相关论文
共 50 条
  • [21] Zap Q-Learning
    Devraj, Adithya M.
    Meyn, Sean P.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [22] Convex Q-Learning
    Lu, Fan
    Mehta, Prashant G.
    Meyn, Sean P.
    Neu, Gergely
    2021 AMERICAN CONTROL CONFERENCE (ACC), 2021, : 4749 - 4756
  • [23] Fuzzy Q-learning
    Glorennec, PY
    Jouffe, L
    PROCEEDINGS OF THE SIXTH IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS I - III, 1997, : 659 - 662
  • [24] Q-learning and robotics
    Touzet, CF
    Santos, JM
    SIMULATION IN INDUSTRY 2001, 2001, : 685 - 689
  • [25] Q-learning automaton
    Qian, F
    Hirata, H
    IEEE/WIC INTERNATIONAL CONFERENCE ON INTELLIGENT AGENT TECHNOLOGY, PROCEEDINGS, 2003, : 432 - 437
  • [26] Periodic Q-Learning
    Lee, Donghwan
    He, Niao
    LEARNING FOR DYNAMICS AND CONTROL, VOL 120, 2020, 120 : 582 - 598
  • [27] Mutual Q-learning
    Reid, Cameron
    Mukhopadhyay, Snehasis
    2020 3RD INTERNATIONAL CONFERENCE ON CONTROL AND ROBOTS (ICCR 2020), 2020, : 128 - 133
  • [28] Robust Q-Learning
    Ertefaie, Ashkan
    McKay, James R.
    Oslin, David
    Strawderman, Robert L.
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2021, 116 (533) : 368 - 381
  • [29] Neural Q-learning
    Stephan ten Hagen
    Ben Kröse
    Neural Computing & Applications, 2003, 12 : 81 - 88
  • [30] Neural Q-learning
    ten Hagen, S
    Kröse, B
    NEURAL COMPUTING & APPLICATIONS, 2003, 12 (02): : 81 - 88