A smoothed Q-learning algorithm for estimating optimal dynamic treatment regimes

被引:2
|
作者
Fan, Yanqin [1 ]
He, Ming [2 ]
Su, Liangjun [3 ]
Zhou, Xiao-Hua [4 ,5 ]
机构
[1] Univ Washington, Dept Econ, Seattle, WA 98195 USA
[2] Univ Technol Sydney, Econ Discipline Grp, Ultimo, Australia
[3] Singapore Management Univ, Sch Econ, Singapore, Singapore
[4] Peking Univ, Beijing Int Ctr Math Res, Beijing 100871, Peoples R China
[5] Peking Univ, Sch Publ Hlth, Beijing 100191, Peoples R China
基金
中国国家自然科学基金;
关键词
asymptotic normality; exceptional law; optimal smoothing parameter; sequential randomization; Wald-type inference; TECHNICAL CHALLENGES; INFERENCE;
D O I
10.1111/sjos.12359
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In this paper, we propose a smoothed Q-learning algorithm for estimating optimal dynamic treatment regimes. In contrast to the Q-learning algorithm in which nonregular inference is involved, we show that, under assumptions adopted in this paper, the proposed smoothed Q-learning estimator is asymptotically normally distributed even when the Q-learning estimator is not and its asymptotic variance can be consistently estimated. As a result, inference based on the smoothed Q-learning estimator is standard. We derive the optimal smoothing parameter and propose a data-driven method for estimating it. The finite sample properties of the smoothed Q-learning estimator are studied and compared with several existing estimators including the Q-learning estimator via an extensive simulation study. We illustrate the new method by analyzing data from the Clinical Antipsychotic Trials of Intervention Effectiveness-Alzheimer's Disease (CATIE-AD) study.
引用
收藏
页码:446 / 469
页数:24
相关论文
共 50 条
  • [21] A New Algorithm to Track Dynamic Goal Position in Q-learning
    Mitra, Soumishila
    Banerjee, Dhrubojyoti
    Konar, Amit
    Janarthanan, R.
    2012 12TH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS (HIS), 2012, : 69 - 74
  • [22] Dynamic feature selection algorithm based on Q-learning mechanism
    Xu, Ruohao
    Li, Mengmeng
    Yang, Zhongliang
    Yang, Lifang
    Qiao, Kangjia
    Shang, Zhigang
    APPLIED INTELLIGENCE, 2021, 51 (10) : 7233 - 7244
  • [23] Stochastic Tree Search for Estimating Optimal Dynamic Treatment Regimes
    Sun, Yilun
    Wang, Lu
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2020, 116 (533) : 421 - 432
  • [24] Optimal scheduling in cloud healthcare system using Q-learning algorithm
    Yafei Li
    Hongfeng Wang
    Na Wang
    Tianhong Zhang
    Complex & Intelligent Systems, 2022, 8 : 4603 - 4618
  • [25] Optimal Management of Office Energy Consumption via Q-learning Algorithm
    Shi, Guang
    Liu, Derong
    Wei, Qinglai
    2017 AMERICAN CONTROL CONFERENCE (ACC), 2017, : 3318 - 3322
  • [26] Optimal scheduling in cloud healthcare system using Q-learning algorithm
    Li, Yafei
    Wang, Hongfeng
    Wang, Na
    Zhang, Tianhong
    COMPLEX & INTELLIGENT SYSTEMS, 2022, 8 (06) : 4603 - 4618
  • [27] Personalized Optimal Bicycle Trip Planning Based on Q-learning Algorithm
    Chen, Yun
    Yan, Wen
    Li, Chunguo
    Huang, Yongming
    Yang, Luxi
    2018 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE (WCNC), 2018,
  • [28] Multicategory Angle-Based Learning for Estimating Optimal Dynamic Treatment Regimes With Censored Data
    Xue, Fei
    Zhang, Yanqing
    Zhou, Wenzhuo
    Fu, Haoda
    Qu, Annie
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2022, 117 (539) : 1438 - 1451
  • [29] Imputation-based Q-learning for optimizing dynamic treatment regimes with right-censored survival outcome
    Lyu, Lingyun
    Cheng, Yu
    Wahed, Abdus S. S.
    BIOMETRICS, 2023, 79 (04) : 3676 - 3689
  • [30] Q-learning Based Dynamic Optimal Relax Automatic Generation Control
    Yu, Tao
    Yuan, Ye
    Liang, Haihua
    POWER AND ENERGY ENGINEERING CONFERENCE 2010, 2010, : 797 - 800