Unbiased Prediction and Feature Selection in High-Dimensional Survival Regression

被引:18
|
作者
Laimighofer, Michael [1 ,2 ]
Krumsiek, Jan [1 ,3 ]
Buettner, Florian [1 ,4 ]
Theis, Fabian J. [1 ,2 ]
机构
[1] Helmholtz Zentrum Munchen, Inst Computat Biol, Ingolstadter Landstr 1, D-85764 Neuherberg, Germany
[2] Tech Univ Munich, Dept Math, Garching, Germany
[3] German Ctr Diabet Res DZD, Munich, Germany
[4] European Mol Biol Lab Hinxton, European Bioinformat Inst, Cambridge, England
基金
英国医学研究理事会;
关键词
high-dimensional survival regression; feature selection; repeated nested cross validation; PENALIZED COX REGRESSION; BREAST-CANCER PATIENTS; EXPRESSION; MODEL; RISK;
D O I
10.1089/cmb.2015.0192
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
With widespread availability of omics profiling techniques, the analysis and interpretation of high-dimensional omics data, for example, for biomarkers, is becoming an increasingly important part of clinical medicine because such datasets constitute a promising resource for predicting survival outcomes. However, early experience has shown that biomarkers often generalize poorly. Thus, it is crucial that models are not overfitted and give accurate results with new data. In addition, reliable detection of multivariate biomarkers with high predictive power (feature selection) is of particular interest in clinical settings. We present an approach that addresses both aspects in high-dimensional survival models. Within a nested cross-validation (CV), we fit a survival model, evaluate a dataset in an unbiased fashion, and select features with the best predictive power by applying a weighted combination of CV runs. We evaluate our approach using simulated toy data, as well as three breast cancer datasets, to predict the survival of breast cancer patients after treatment. In all datasets, we achieve more reliable estimation of predictive power for unseen cases and better predictive performance compared to the standard CoxLasso model. Taken together, we present a comprehensive and flexible framework for survival models, including performance estimation, final feature selection, and final model construction. The proposed algorithm is implemented in an open source R package (SurvRank) available on CRAN.
引用
收藏
页码:279 / 290
页数:12
相关论文
共 50 条
  • [21] Feature selection for high-dimensional data in astronomy
    Zheng, Hongwen
    Zhang, Yanxia
    ADVANCES IN SPACE RESEARCH, 2008, 41 (12) : 1960 - 1964
  • [22] A filter feature selection for high-dimensional data
    Janane, Fatima Zahra
    Ouaderhman, Tayeb
    Chamlal, Hasna
    JOURNAL OF ALGORITHMS & COMPUTATIONAL TECHNOLOGY, 2023, 17
  • [23] A semi-parametric approach to feature selection in high-dimensional linear regression models
    Liu, Yuyang
    Pi, Pengfei
    Luo, Shan
    COMPUTATIONAL STATISTICS, 2023, 38 (02) : 979 - 1000
  • [24] Feature selection for high-dimensional temporal data
    Tsagris, Michail
    Lagani, Vincenzo
    Tsamardinos, Ioannis
    BMC BIOINFORMATICS, 2018, 19
  • [25] Feature selection for high-dimensional temporal data
    Michail Tsagris
    Vincenzo Lagani
    Ioannis Tsamardinos
    BMC Bioinformatics, 19
  • [26] Feature Selection with High-Dimensional Imbalanced Data
    Van Hulse, Jason
    Khoshgoftaar, Taghi M.
    Napolitano, Amri
    Wald, Randall
    2009 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2009), 2009, : 507 - 514
  • [27] High-dimensional feature selection for genomic datasets
    Afshar, Majid
    Usefi, Hamid
    KNOWLEDGE-BASED SYSTEMS, 2020, 206
  • [28] A semi-parametric approach to feature selection in high-dimensional linear regression models
    Yuyang Liu
    Pengfei Pi
    Shan Luo
    Computational Statistics, 2023, 38 : 979 - 1000
  • [29] FEATURE SELECTION FOR HIGH-DIMENSIONAL DATA ANALYSIS
    Verleysen, Michel
    ECTA 2011/FCTA 2011: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON EVOLUTIONARY COMPUTATION THEORY AND APPLICATIONS AND INTERNATIONAL CONFERENCE ON FUZZY COMPUTATION THEORY AND APPLICATIONS, 2011,
  • [30] Prediction in abundant high-dimensional linear regression
    Cook, R. Dennis
    Forzani, Liliana
    Rothman, Adam J.
    ELECTRONIC JOURNAL OF STATISTICS, 2013, 7 : 3059 - 3088