Online Bootstrap Inference For Policy Evaluation In Reinforcement Learning

被引:7
|
作者
Ramprasad, Pratik [1 ]
Li, Yuantong [2 ]
Yang, Zhuoran [3 ]
Wang, Zhaoran [4 ]
Sun, Will Wei [5 ]
Cheng, Guang [2 ]
机构
[1] Purdue Univ, Dept Stat, W Lafayette, IN 47907 USA
[2] UCLA, Dept Stat, Los Angeles, CA USA
[3] Yale Univ, Dept Stat & Data Sci, New Haven, CT USA
[4] Northwestern Univ, Dept Ind Engn & Management Sci, Evanston, IL 60208 USA
[5] Purdue Univ, Krannert Sch Management, W Lafayette, IN 47907 USA
基金
美国国家科学基金会;
关键词
Asymptotic normality; Multiplier bootstrap; Reinforcement learning; Statistical inference; Stochastic approximation; STOCHASTIC-APPROXIMATION;
D O I
10.1080/01621459.2022.2096620
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The recent emergence of reinforcement learning (RL) has created a demand for robust statistical inference methods for the parameter estimates computed using these algorithms. Existing methods for inference in online learning are restricted to settings involving independently sampled observations, while inference methods in RL have so far been limited to the batch setting. The bootstrap is a flexible and efficient approach for statistical inference in online learning algorithms, but its efficacy in settings involving Markov noise, such as RL, has yet to be explored. In this article, we study the use of the online bootstrap method for inference in RL policy evaluation. In particular, we focus on the temporal difference (TD) learning and Gradient TD (GTD) learning algorithms, which are themselves special instances of linear stochastic approximation under Markov noise. The method is shown to be distributionally consistent for statistical inference in policy evaluation, and numerical experiments are included to demonstrate the effectiveness of this algorithm across a range of real RL environments. Supplementary materials for this article are available online.
引用
收藏
页码:2901 / 2914
页数:14
相关论文
共 50 条
  • [41] Reinforcement Learning Based Online Request Scheduling Framework for Workload-Adaptive Edge Deep Learning Inference
    Tan, Xinrui
    Li, Hongjia
    Xie, Xiaofei
    Guo, Lu
    Ansari, Nirwan
    Huang, Xueqing
    Wang, Liming
    Xu, Zhen
    Liu, Yang
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (12) : 13222 - 13239
  • [42] Online testing with reinforcement learning
    Veanes, Margus
    Roy, Pritam
    Campbell, Colin
    FORMAL APPROACHES TO SOFTWARE TESTING AND RUNTIME VERIFICATION, 2006, 4262 : 240 - +
  • [43] Online robust estimation and bootstrap inference for function-on-scalar regression
    Cheng, Guanghui
    Hu, Wenjuan
    Lin, Ruitao
    Wang, Chen
    STATISTICS AND COMPUTING, 2025, 35 (01)
  • [44] Online shielding for reinforcement learning
    Koenighofer, Bettina
    Rudolf, Julian
    Palmisano, Alexander
    Tappler, Martin
    Bloem, Roderick
    INNOVATIONS IN SYSTEMS AND SOFTWARE ENGINEERING, 2023, 19 (04) : 379 - 394
  • [45] Online shielding for reinforcement learning
    Bettina Könighofer
    Julian Rudolf
    Alexander Palmisano
    Martin Tappler
    Roderick Bloem
    Innovations in Systems and Software Engineering, 2023, 19 : 379 - 394
  • [46] Online Sparse Reinforcement Learning
    Hao, Botao
    Lattimore, Tor
    Szepesvari, Csaba
    Wang, Mengdi
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130 : 316 - +
  • [47] Robust On-Policy Sampling for Data-Efficient Policy Evaluation in Reinforcement Learning
    Zhong, Rujie
    Zhang, Duohan
    Schafer, Lukas
    Albrecht, Stefano V.
    Hanna, Josiah P.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [48] Kernel-based direct policy search reinforcement learning based on variational Bayesian inference
    Yamaguchi, Nobuhiko
    Fukuda, Osamu
    Okumura, Hiroshi
    2019 SEVENTH INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING WORKSHOPS (CANDARW 2019), 2019, : 184 - 187
  • [49] Off-Policy Meta-Reinforcement Learning With Belief-Based Task Inference
    Imagawa, Takahisa
    Hiraoka, Takuya
    Tsuruoka, Yoshimasa
    IEEE ACCESS, 2022, 10 : 49494 - 49507
  • [50] Parallel Bootstrap-Based On-Policy Deep Reinforcement Learning for Continuous Fluid Flow Control Applications
    Viquerat, Jonathan
    Hachem, Elie
    FLUIDS, 2023, 8 (07)