Computing the Hazard Ratios Associated With Explanatory Variables Using Machine Learning Models of Survival Data

被引:21
|
作者
Sundrani, Sameer [1 ,2 ]
Lu, James [1 ]
机构
[1] Genentech Inc, Modeling & Simulat Clin Pharmacol, San Francisco, CA USA
[2] Stanford Univ, Biomed Computat, Sch Engn & Med, Stanford, CA USA
来源
关键词
D O I
10.1200/CCI.20.00172
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
PURPOSE The application of Cox proportional hazards (CoxPH) models to survival data and the derivation of hazard ratio (HR) are well established. Although nonlinear, tree-based machine learning (ML) models have been developed and applied to the survival analysis, no methodology exists for computing HRs associated with explanatory variables from such models. We describe a novel way to compute HRs from tree-based ML models using the SHapley Additive exPlanation values, which is a locally accurate and consistent methodology to quantify explanatory variables' contribution to predictions. METHODS We used three sets of publicly available survival data consisting of patients with colon, breast, or pan cancer and compared the performance of CoxPH with the state-of-the-art ML model, XGBoost. To compute the HR for explanatory variables from the XGBoost model, the SHapley Additive exPlanation values were exponentiated and the ratio of the means over the two subgroups was calculated. The CI was computed via bootstrapping the training data and generating the ML model 1,000 times. Across the three data sets, we systematically compared HRs for all explanatory variables. Open-source libraries in Python and R were used in the analyses. RESULTS For the colon and breast cancer data sets, the performance of CoxPH and XGBoost was comparable, and we showed good consistency in the computed HRs. In the pan-cancer data set, we showed agreement in most variables but also an opposite finding in two of the explanatory variables between the CoxPH and XGBoost result. Subsequent Kaplan-Meier plots supported the finding of the XGBoost model. CONCLUSION Enabling the derivation of HR from ML models can help to improve the identification of risk factors from complex survival data sets and to enhance the prediction of clinical trial outcomes. (C) 2021 by American Society of Clinical Oncology
引用
收藏
页码:364 / 378
页数:15
相关论文
共 50 条
  • [41] Estimation of reliability using failure-degradation data with explanatory variables
    V. Bagdonavičius
    A. Bikelis
    V. Kazakevičius
    M. Nikulin
    Journal of Mathematical Sciences, 2009, 163 (3) : 202 - 212
  • [42] A Secure Data Classification Model in Cloud Computing Using Machine Learning Approach
    Kaur, Kulwinder
    Zandu, Vikas
    INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2016, 9 (08): : 13 - 21
  • [43] Fairness Audit of Machine Learning Models with Confidential Computing
    Park, Saerom
    Kim, Seongmin
    Lim, Yeon-sup
    PROCEEDINGS OF THE ACM WEB CONFERENCE 2022 (WWW'22), 2022, : 3488 - 3499
  • [44] Improving soil pH prediction and mapping using anthropogenic variables and machine learning models
    Li, Daocheng
    Xiao, Erlong
    Xia, Yingxin
    Liang, Xingyu
    Guo, Mengxin
    Ning, Lixin
    Yan, Jun
    GEOCARTO INTERNATIONAL, 2025, 40 (01)
  • [45] Extended excess hazard models for spatially dependent survival data
    Amaral, Andre Victor Ribeiro
    Rubio, Francisco Javier
    Quaresma, Manuela
    Rodriguez-Cortes, Francisco J.
    Moraga, Paula
    STATISTICAL METHODS IN MEDICAL RESEARCH, 2024, 33 (04) : 681 - 701
  • [46] Misspecified proportional hazard models and confirmatory analysis of survival data
    Parner, ET
    Keiding, N
    BIOMETRIKA, 2001, 88 (02) : 459 - 468
  • [47] Using machine learning to select variables in data envelopment analysis: Simulations and application using electricity distribution data
    Duras, Toni
    Javed, Farrukh
    Mansson, Kristofer
    Sjolander, Paer
    Soderberg, Magnus
    ENERGY ECONOMICS, 2023, 120
  • [48] Urban flood hazard mapping using machine learning models: GARP, RF, MaxEnt and NB
    Mahya Norallahi
    Hesam Seyed Kaboli
    Natural Hazards, 2021, 106 : 119 - 137
  • [49] Improving Deep Learning Models Considering the Time Lags between Explanatory and Response Variables
    Kim, Chaehyeon
    Lee, Ki Yong
    JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2024, 20 (03): : 345 - 359
  • [50] Urban flood hazard mapping using machine learning models: GARP, RF, MaxEnt and NB
    Norallahi, Mahya
    Kaboli, Hesam Seyed
    NATURAL HAZARDS, 2021, 106 (01) : 119 - 137