Computing the Hazard Ratios Associated With Explanatory Variables Using Machine Learning Models of Survival Data

被引:21
|
作者
Sundrani, Sameer [1 ,2 ]
Lu, James [1 ]
机构
[1] Genentech Inc, Modeling & Simulat Clin Pharmacol, San Francisco, CA USA
[2] Stanford Univ, Biomed Computat, Sch Engn & Med, Stanford, CA USA
来源
关键词
D O I
10.1200/CCI.20.00172
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
PURPOSE The application of Cox proportional hazards (CoxPH) models to survival data and the derivation of hazard ratio (HR) are well established. Although nonlinear, tree-based machine learning (ML) models have been developed and applied to the survival analysis, no methodology exists for computing HRs associated with explanatory variables from such models. We describe a novel way to compute HRs from tree-based ML models using the SHapley Additive exPlanation values, which is a locally accurate and consistent methodology to quantify explanatory variables' contribution to predictions. METHODS We used three sets of publicly available survival data consisting of patients with colon, breast, or pan cancer and compared the performance of CoxPH with the state-of-the-art ML model, XGBoost. To compute the HR for explanatory variables from the XGBoost model, the SHapley Additive exPlanation values were exponentiated and the ratio of the means over the two subgroups was calculated. The CI was computed via bootstrapping the training data and generating the ML model 1,000 times. Across the three data sets, we systematically compared HRs for all explanatory variables. Open-source libraries in Python and R were used in the analyses. RESULTS For the colon and breast cancer data sets, the performance of CoxPH and XGBoost was comparable, and we showed good consistency in the computed HRs. In the pan-cancer data set, we showed agreement in most variables but also an opposite finding in two of the explanatory variables between the CoxPH and XGBoost result. Subsequent Kaplan-Meier plots supported the finding of the XGBoost model. CONCLUSION Enabling the derivation of HR from ML models can help to improve the identification of risk factors from complex survival data sets and to enhance the prediction of clinical trial outcomes. (C) 2021 by American Society of Clinical Oncology
引用
收藏
页码:364 / 378
页数:15
相关论文
共 50 条
  • [21] Prediction of Key Variables in Wastewater Treatment Plants Using Machine Learning Models
    Salles, Rodrigo
    Mendes, Jerome
    Araujo, Rui
    Melo, Carlos
    Moura, Pedro
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [22] Standardization of Featureless Variables for Machine Learning Models Using Natural Language Processing
    Modarresi, Kourosh
    Munir, Abdurrahman
    COMPUTATIONAL SCIENCE - ICCS 2018, PT II, 2018, 10861 : 234 - 246
  • [23] Survival analysis of breast cancer patients using machine learning models
    Evangeline, I. Keren
    Kirubha, S. P. Angeline
    Precious, J. Glory
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (20) : 30909 - 30928
  • [24] Survival analysis of breast cancer patients using machine learning models
    Keren Evangeline I.
    S. P. Angeline Kirubha
    J. Glory Precious
    Multimedia Tools and Applications, 2023, 82 : 30909 - 30928
  • [25] Statistical Models to Analyze Failure, Wear, Fatigue, and Degradation Data with Explanatory Variables
    Bagdonavicius, Vilijandas
    Nikulin, Mikhail
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2009, 38 (16-17) : 3031 - 3047
  • [26] The Explanatory Visualization Framework: An active learning framework for teaching creative computing using explanatory visualizations
    Roberts, Jonathan C.
    Ritsos, Panagiotis D.
    Jackson, James R.
    Headleand, Christopher
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2018, 24 (01) : 791 - 801
  • [27] Using Visualization to Illustrate Machine Learning Models for Genomic Data
    Qu, Zhonglin
    Zhou, Yi
    Quang Vinh Nguyen
    Catchpoole, Daniel R.
    PROCEEDINGS OF THE AUSTRALASIAN COMPUTER SCIENCE WEEK MULTICONFERENCE (ACSW 2019), 2019,
  • [28] Classification of a-thalassemia data using machine learning models
    Christensen, Frederik
    Kilic, Deniz Kenan
    Nielsen, Izabela Ewa
    El-Galaly, Tarec Christoffer
    Glenthoj, Andreas
    Helby, Jens
    Frederiksen, Henrik
    Moller, Soren
    Fuglkjaer, Alexander Djupnes
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2025, 260
  • [29] Explicable Machine Learning Models Using Rich Geospatial Data
    Bramson, Aaron
    Mita, Masayoshi
    2024 IEEE 48TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE, COMPSAC 2024, 2024, : 2381 - 2386
  • [30] Machine learning on big data for future computing
    Jeong, Young-Sik
    Hassan, Houcine
    Sangaiah, Arun Kumar
    JOURNAL OF SUPERCOMPUTING, 2019, 75 (06): : 2925 - 2929