Robust Q-Learning

被引:18
|
作者
Ertefaie, Ashkan [1 ]
McKay, James R. [2 ]
Oslin, David [3 ,4 ,5 ]
Strawderman, Robert L. [1 ]
机构
[1] Univ Rochester, Dept Biostat & Computat Biol, 265 Crittenden Blvd,CU 420630, Rochester, NY 14642 USA
[2] Univ Penn, Dept Psychiat, Ctr Continuum Care Addict, Philadelphia, PA 19104 USA
[3] Univ Penn, Philadelphia Vet Adm Med Ctr, Philadelphia, PA 19104 USA
[4] Univ Penn, Treatment Res Ctr, Philadelphia, PA 19104 USA
[5] Univ Penn, Ctr Studies Addict, Dept Psychiat, Philadelphia, PA 19104 USA
关键词
Cross-fitting; Data-adaptive techniques; Dynamic treatment strategies; Residual confounding; DYNAMIC TREATMENT REGIMES; DESIGN; INFERENCE; STRATEGIES; SELECTION;
D O I
10.1080/01621459.2020.1753522
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Q-learning is a regression-based approach that is widely used to formalize the development of an optimal dynamic treatment strategy. Finite dimensional working models are typically used to estimate certain nuisance parameters, and misspecification of these working models can result in residual confounding and/or efficiency loss. We propose a robust Q-learning approach which allows estimating such nuisance parameters using data-adaptive techniques. We study the asymptotic behavior of our estimators and provide simulation studies that highlight the need for and usefulness of the proposed method in practice. We use the data from the "Extending Treatment Effectiveness of Naltrexone" multistage randomized trial to illustrate our proposed methods. Supplementary materials for this article are available online.
引用
收藏
页码:368 / 381
页数:14
相关论文
共 50 条
  • [31] Path Planning Using Wasserstein Distributionally Robust Deep Q-learning
    Alpturk, Cem
    Renganathan, Venkatraman
    2023 EUROPEAN CONTROL CONFERENCE, ECC, 2023,
  • [32] Sample Complexity of Variance-Reduced Distributionally Robust Q-Learning
    Wang, Shengbo
    Si, Nian
    Blanchet, Jose
    Zhou, Zhengyuan
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
  • [33] Enhancing Nash Q-learning and Team Q-learning mechanisms by using bottlenecks
    Ghazanfari, Behzad
    Mozayani, Nasser
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2014, 26 (06) : 2771 - 2783
  • [34] Comparison of Deep Q-Learning, Q-Learning and SARSA Reinforced Learning for Robot Local Navigation
    Anas, Hafiq
    Ong, Wee Hong
    Malik, Owais Ahmed
    ROBOT INTELLIGENCE TECHNOLOGY AND APPLICATIONS 6, 2022, 429 : 443 - 454
  • [35] Learning to Play Pac-Xon with Q-Learning and Two Double Q-Learning Variants
    Schilperoort, Jits
    Mak, Ivar
    Drugan, Madalina M.
    Wiering, Marco A.
    2018 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI), 2018, : 1151 - 1158
  • [36] An Online Home Energy Management System using Q-Learning and Deep Q-Learning
    Izmitligil, Hasan
    Karamancioglu, Abdurrahman
    SUSTAINABLE COMPUTING-INFORMATICS & SYSTEMS, 2024, 43
  • [37] Robust Q-learning algorithm for Markov decision processes under Wasserstein uncertainty
    Neufeld, Ariel
    Sester, Julian
    AUTOMATICA, 2024, 168
  • [38] A Q-learning based robust MPC method for DFIG to suppress the rotor overcurrent
    Song, Yuyan
    Wang, Yuhong
    Zeng, Qi
    Zheng, Zongsheng
    Liao, Jianquan
    Liao, Yiben
    INTERNATIONAL JOURNAL OF ELECTRICAL POWER & ENERGY SYSTEMS, 2022, 141
  • [39] Q-learning with Logarithmic Regret
    Yang, Kunhe
    Yang, Lin F.
    Du, Simon S.
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [40] Adaptive Bases for Q-learning
    Di Castro, Dotan
    Mannor, Shie
    49TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2010, : 4587 - 4593