Unbiased Prediction and Feature Selection in High-Dimensional Survival Regression

被引：18

作者：

Laimighofer, Michael ^{[1
,2
]}

Krumsiek, Jan ^{[1
,3
]}

Buettner, Florian ^{[1
,4
]}

Theis, Fabian J. ^{[1
,2
]}

机构：

[1] Helmholtz Zentrum Munchen, Inst Computat Biol, Ingolstadter Landstr 1, D-85764 Neuherberg, Germany

[2] Tech Univ Munich, Dept Math, Garching, Germany

[3] German Ctr Diabet Res DZD, Munich, Germany

[4] European Mol Biol Lab Hinxton, European Bioinformat Inst, Cambridge, England

来源：

JOURNAL OF COMPUTATIONAL BIOLOGY | 2016年 / 23卷 / 04期

基金：

英国医学研究理事会;

关键词：

high-dimensional survival regression; feature selection; repeated nested cross validation; PENALIZED COX REGRESSION; BREAST-CANCER PATIENTS; EXPRESSION; MODEL; RISK;

D O I：

10.1089/cmb.2015.0192

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

With widespread availability of omics profiling techniques, the analysis and interpretation of high-dimensional omics data, for example, for biomarkers, is becoming an increasingly important part of clinical medicine because such datasets constitute a promising resource for predicting survival outcomes. However, early experience has shown that biomarkers often generalize poorly. Thus, it is crucial that models are not overfitted and give accurate results with new data. In addition, reliable detection of multivariate biomarkers with high predictive power (feature selection) is of particular interest in clinical settings. We present an approach that addresses both aspects in high-dimensional survival models. Within a nested cross-validation (CV), we fit a survival model, evaluate a dataset in an unbiased fashion, and select features with the best predictive power by applying a weighted combination of CV runs. We evaluate our approach using simulated toy data, as well as three breast cancer datasets, to predict the survival of breast cancer patients after treatment. In all datasets, we achieve more reliable estimation of predictive power for unseen cases and better predictive performance compared to the standard CoxLasso model. Taken together, we present a comprehensive and flexible framework for survival models, including performance estimation, final feature selection, and final model construction. The proposed algorithm is implemented in an open source R package (SurvRank) available on CRAN.

引用

页码：279 / 290

页数：12

共 50 条

[41] A systematic review on model selection in high-dimensional regression
Eun Ryung Lee
Jinwoo Cho
Kyusang Yu
Journal of the Korean Statistical Society, 2019, 48 : 1 - 12
[42] A Survey of Tuning Parameter Selection for High-Dimensional Regression
Wu, Yunan
Wang, Lan
ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION, VOL 7, 2020, 2020, 7 : 209 - 226
[43] A stepwise regression algorithm for high-dimensional variable selection
Hwang, Jing-Shiang
Hu, Tsuey-Hwa
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2015, 85 (09) : 1793 - 1806
[44] Variable Selection Diagnostics Measures for High-Dimensional Regression
Nan, Ying
Yang, Yuhong
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2014, 23 (03) : 636 - 656
[45] Neighborhood Component Feature Selection for High-Dimensional Data
Yang, Wei
Wang, Kuanquan
Zuo, Wangmeng
JOURNAL OF COMPUTERS, 2012, 7 (01) : 161 - 168
[46] Efficient feature selection filters for high-dimensional data
Ferreira, Artur J.
Figueiredo, Mario A. T.
PATTERN RECOGNITION LETTERS, 2012, 33 (13) : 1794 - 1804
[47] Optimal Feature Selection in High-Dimensional Discriminant Analysis
Kolar, Mladen
Liu, Han
IEEE TRANSACTIONS ON INFORMATION THEORY, 2015, 61 (02) : 1063 - 1083
[48] Improved PSO for feature selection on high-dimensional datasets
Tran, Binh (binh.tran@ecs.vuw.ac.nz), 1600, Springer Verlag (8886):
[49] Improved PSO for Feature Selection on High-Dimensional Datasets
Tran, Binh
Xue, Bing
Zhang, Mengjie
SIMULATED EVOLUTION AND LEARNING (SEAL 2014), 2014, 8886 : 503 - 515
[50] On the scalability of feature selection methods on high-dimensional data
V. Bolón-Canedo
D. Rego-Fernández
D. Peteiro-Barral
A. Alonso-Betanzos
B. Guijarro-Berdiñas
N. Sánchez-Maroño
Knowledge and Information Systems, 2018, 56 : 395 - 442

← 1 2 3 4 5 →