Robust Q-Learning

被引:18
|
作者
Ertefaie, Ashkan [1 ]
McKay, James R. [2 ]
Oslin, David [3 ,4 ,5 ]
Strawderman, Robert L. [1 ]
机构
[1] Univ Rochester, Dept Biostat & Computat Biol, 265 Crittenden Blvd,CU 420630, Rochester, NY 14642 USA
[2] Univ Penn, Dept Psychiat, Ctr Continuum Care Addict, Philadelphia, PA 19104 USA
[3] Univ Penn, Philadelphia Vet Adm Med Ctr, Philadelphia, PA 19104 USA
[4] Univ Penn, Treatment Res Ctr, Philadelphia, PA 19104 USA
[5] Univ Penn, Ctr Studies Addict, Dept Psychiat, Philadelphia, PA 19104 USA
关键词
Cross-fitting; Data-adaptive techniques; Dynamic treatment strategies; Residual confounding; DYNAMIC TREATMENT REGIMES; DESIGN; INFERENCE; STRATEGIES; SELECTION;
D O I
10.1080/01621459.2020.1753522
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Q-learning is a regression-based approach that is widely used to formalize the development of an optimal dynamic treatment strategy. Finite dimensional working models are typically used to estimate certain nuisance parameters, and misspecification of these working models can result in residual confounding and/or efficiency loss. We propose a robust Q-learning approach which allows estimating such nuisance parameters using data-adaptive techniques. We study the asymptotic behavior of our estimators and provide simulation studies that highlight the need for and usefulness of the proposed method in practice. We use the data from the "Extending Treatment Effectiveness of Naltrexone" multistage randomized trial to illustrate our proposed methods. Supplementary materials for this article are available online.
引用
收藏
页码:368 / 381
页数:14
相关论文
共 50 条
  • [21] Convex Q-Learning
    Lu, Fan
    Mehta, Prashant G.
    Meyn, Sean P.
    Neu, Gergely
    2021 AMERICAN CONTROL CONFERENCE (ACC), 2021, : 4749 - 4756
  • [22] Fuzzy Q-learning
    Glorennec, PY
    Jouffe, L
    PROCEEDINGS OF THE SIXTH IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS I - III, 1997, : 659 - 662
  • [23] Q-learning and robotics
    Touzet, CF
    Santos, JM
    SIMULATION IN INDUSTRY 2001, 2001, : 685 - 689
  • [24] Q-learning automaton
    Qian, F
    Hirata, H
    IEEE/WIC INTERNATIONAL CONFERENCE ON INTELLIGENT AGENT TECHNOLOGY, PROCEEDINGS, 2003, : 432 - 437
  • [25] Periodic Q-Learning
    Lee, Donghwan
    He, Niao
    LEARNING FOR DYNAMICS AND CONTROL, VOL 120, 2020, 120 : 582 - 598
  • [26] Mutual Q-learning
    Reid, Cameron
    Mukhopadhyay, Snehasis
    2020 3RD INTERNATIONAL CONFERENCE ON CONTROL AND ROBOTS (ICCR 2020), 2020, : 128 - 133
  • [27] Neural Q-learning
    Stephan ten Hagen
    Ben Kröse
    Neural Computing & Applications, 2003, 12 : 81 - 88
  • [28] Neural Q-learning
    ten Hagen, S
    Kröse, B
    NEURAL COMPUTING & APPLICATIONS, 2003, 12 (02): : 81 - 88
  • [29] Logistic Q-Learning
    Bas-Serrano, Joan
    Curi, Sebastian
    Krause, Andreas
    Neu, Gergely
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [30] Robust Attitude Control of an Agile Aircraft Using Improved Q-Learning
    Zahmatkesh, Mohsen
    Emami, Seyyed Ali
    Banazadeh, Afshin
    Castaldi, Paolo
    ACTUATORS, 2022, 11 (12)