Randomized Exploration for Reinforcement Learning with General Value Function Approximation

被引:0
|
作者
Ishfaq, Haque [1 ,2 ]
Cui, Qiwen [3 ]
Viet Nguyen [1 ,2 ]
Ayoub, Alex [4 ,5 ]
Yang, Zhuoran [6 ]
Wang, Zhaoran [7 ]
Precup, Doina [1 ,2 ,8 ]
Yang, Lin F. [9 ]
机构
[1] Mila, Montreal, PQ, Canada
[2] McGill Univ, Sch Comp Sci, Montreal, PQ, Canada
[3] Peking Univ, Sch Math Sci, Beijing, Peoples R China
[4] Univ Alberta, Amii, Edmonton, AB, Canada
[5] Univ Alberta, Dept Comp Sci, Edmonton, AB, Canada
[6] Princeton Univ, Dept Operat Res & Financial Engn, Princeton, NJ 08544 USA
[7] Northwestern Univ, Ind Engn Management Sci, Evanston, IL 60208 USA
[8] DeepMind, Montreal, PQ, Canada
[9] Univ Calif Los Angeles, Dept Elect & Comp Engn, Los Angeles, CA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm as well as the optimism principle. Unlike existing upper-confidence-bound (UCB) based approaches, which are often computationally intractable, our algorithm drives exploration by simply perturbing the training data with judiciously chosen i.i.d. scalar noises. To attain optimistic value function estimation without resorting to a UCB-style bonus, we introduce an optimistic reward sampling procedure. When the value functions can be represented by a function class F, our algorithm achieves a worst-case regret bound of (O) over tilde (poly(d(E)H)root T) where T is the time elapsed, d(E) is the planning horizon and d E is the eluder dimension of In the linear setting, our algorithm reduces to LSVI-PHE, a variant of RLSVI, that enjoys an (O) over tilde root d(3)H(3)T) regret. We complement the theory with an empirical evaluation across known difficult exploration tasks.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Parallel reinforcement learning with linear function approximation
    Grounds, Matthew
    Kudenko, Daniel
    ADAPTIVE AGENTS AND MULTI-AGENT SYSTEMS, 2008, 4865 : 60 - 74
  • [32] Safe Reinforcement Learning with Linear Function Approximation
    Amani, Sanae
    Thrampoulidis, Christos
    Yang, Lin F.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [33] The Value Function Polytope in Reinforcement Learning
    Dadashi, Robert
    Taiga, Adrien Ali
    Le Roux, Nicolas
    Schuurmans, Dale
    Bellemare, Marc G. L.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [34] Integrating Symmetry of Environment by Designing Special Basis functions for Value Function Approximation in Reinforcement Learning
    Wang, Guo-fang
    Fang, Zhou
    Li, Bo
    Li, Ping
    2016 14TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS AND VISION (ICARCV), 2016,
  • [35] Local and soft feature selection for value function approximation in batch reinforcement learning for robot navigation
    Fatemeh Fathinezhad
    Peyman Adibi
    Bijan Shoushtarian
    Jocelyn Chanussot
    The Journal of Supercomputing, 2024, 80 : 10720 - 10745
  • [36] Local and soft feature selection for value function approximation in batch reinforcement learning for robot navigation
    Fathinezhad, Fatemeh
    Adibi, Peyman
    Shoushtarian, Bijan
    Chanussot, Jocelyn
    JOURNAL OF SUPERCOMPUTING, 2024, 80 (08): : 10720 - 10745
  • [37] Rethinking Value Function Learning for Generalization in Reinforcement Learning
    Moon, Seungyong
    Lee, JunYeong
    Song, Hyun Oh
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [38] Reinforcement learning with function approximation for cooperative navigation tasks
    Melo, Francisco S.
    Ribeiro, M. Isabel
    2008 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS 1-9, 2008, : 3321 - +
  • [39] Online Model Selection for Reinforcement Learning with Function Approximation
    Lee, Jonathan N.
    Pacchiano, Aldo
    Muthukumar, Vidya
    Kong, Weihao
    Brunskill, Emma
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [40] Reinforcement Learning With Function Approximation for Traffic Signal Control
    Prashanth, L. A.
    Bhatnagar, Shalabh
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2011, 12 (02) : 412 - 421