Computing the Hazard Ratios Associated With Explanatory Variables Using Machine Learning Models of Survival Data

被引：21

作者：

Sundrani, Sameer ^{[1
,2
]}

Lu, James ^{[1
]}

机构：

[1] Genentech Inc, Modeling & Simulat Clin Pharmacol, San Francisco, CA USA

[2] Stanford Univ, Biomed Computat, Sch Engn & Med, Stanford, CA USA

来源：

JCO CLINICAL CANCER INFORMATICS | 2021年 / 5卷

关键词：

D O I：

10.1200/CCI.20.00172

中图分类号：

R73 [肿瘤学];

学科分类号：

100214 ;

摘要：

PURPOSE The application of Cox proportional hazards (CoxPH) models to survival data and the derivation of hazard ratio (HR) are well established. Although nonlinear, tree-based machine learning (ML) models have been developed and applied to the survival analysis, no methodology exists for computing HRs associated with explanatory variables from such models. We describe a novel way to compute HRs from tree-based ML models using the SHapley Additive exPlanation values, which is a locally accurate and consistent methodology to quantify explanatory variables' contribution to predictions. METHODS We used three sets of publicly available survival data consisting of patients with colon, breast, or pan cancer and compared the performance of CoxPH with the state-of-the-art ML model, XGBoost. To compute the HR for explanatory variables from the XGBoost model, the SHapley Additive exPlanation values were exponentiated and the ratio of the means over the two subgroups was calculated. The CI was computed via bootstrapping the training data and generating the ML model 1,000 times. Across the three data sets, we systematically compared HRs for all explanatory variables. Open-source libraries in Python and R were used in the analyses. RESULTS For the colon and breast cancer data sets, the performance of CoxPH and XGBoost was comparable, and we showed good consistency in the computed HRs. In the pan-cancer data set, we showed agreement in most variables but also an opposite finding in two of the explanatory variables between the CoxPH and XGBoost result. Subsequent Kaplan-Meier plots supported the finding of the XGBoost model. CONCLUSION Enabling the derivation of HR from ML models can help to improve the identification of risk factors from complex survival data sets and to enhance the prediction of clinical trial outcomes. (C) 2021 by American Society of Clinical Oncology

引用

页码：364 / 378

页数：15

共 50 条

[41] Estimation of reliability using failure-degradation data with explanatory variables
V. Bagdonavičius
A. Bikelis
V. Kazakevičius
M. Nikulin
Journal of Mathematical Sciences, 2009, 163 (3) : 202 - 212
[42] A Secure Data Classification Model in Cloud Computing Using Machine Learning Approach
Kaur, Kulwinder
Zandu, Vikas
INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2016, 9 (08): : 13 - 21
[43] Fairness Audit of Machine Learning Models with Confidential Computing
Park, Saerom
Kim, Seongmin
Lim, Yeon-sup
PROCEEDINGS OF THE ACM WEB CONFERENCE 2022 (WWW'22), 2022, : 3488 - 3499
[44] Improving soil pH prediction and mapping using anthropogenic variables and machine learning models
Li, Daocheng
Xiao, Erlong
Xia, Yingxin
Liang, Xingyu
Guo, Mengxin
Ning, Lixin
Yan, Jun
GEOCARTO INTERNATIONAL, 2025, 40 (01)
[45] Extended excess hazard models for spatially dependent survival data
Amaral, Andre Victor Ribeiro
Rubio, Francisco Javier
Quaresma, Manuela
Rodriguez-Cortes, Francisco J.
Moraga, Paula
STATISTICAL METHODS IN MEDICAL RESEARCH, 2024, 33 (04) : 681 - 701
[46] Misspecified proportional hazard models and confirmatory analysis of survival data
Parner, ET
Keiding, N
BIOMETRIKA, 2001, 88 (02) : 459 - 468
[47] Using machine learning to select variables in data envelopment analysis: Simulations and application using electricity distribution data
Duras, Toni
Javed, Farrukh
Mansson, Kristofer
Sjolander, Paer
Soderberg, Magnus
ENERGY ECONOMICS, 2023, 120
[48] Urban flood hazard mapping using machine learning models: GARP, RF, MaxEnt and NB
Mahya Norallahi
Hesam Seyed Kaboli
Natural Hazards, 2021, 106 : 119 - 137
[49] Improving Deep Learning Models Considering the Time Lags between Explanatory and Response Variables
Kim, Chaehyeon
Lee, Ki Yong
JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2024, 20 (03): : 345 - 359
[50] Urban flood hazard mapping using machine learning models: GARP, RF, MaxEnt and NB
Norallahi, Mahya
Kaboli, Hesam Seyed
NATURAL HAZARDS, 2021, 106 (01) : 119 - 137

← 1 2 3 4 5 →