Predicting Survival Outcomes in the Presence of Unlabeled Data

被引:3
|
作者
Haredasht, Fateme Nateghi [1 ,2 ,3 ]
Vens, Celine [1 ,2 ,3 ]
机构
[1] Katholieke Univ Leuven, Dept Publ Hlth & Primary Care, Campus KULAK,Etienne Sabbelaan 53, B-8500 Kortrijk, Belgium
[2] IMEC, ITEC, Etienne Sabbelaan 51, B-8500 Kortrijk, Belgium
[3] Katholieke Univ Leuven, Etienne Sabbelaan 51, B-8500 Kortrijk, Belgium
关键词
Survival analysis; Semi-supervised learning; Random survival forest; Self-training; REGRESSION; MODEL;
D O I
10.1007/s10994-022-06257-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many clinical studies require the follow-up of patients over time. This is challenging: apart from frequently observed drop-out, there are often also organizational and financial challenges, which can lead to reduced data collection and, in turn, can complicate subsequent analyses. In contrast, there is often plenty of baseline data available of patients with similar characteristics and background information, e.g., from patients that fall outside the study time window. In this article, we investigate whether we can benefit from the inclusion of such unlabeled data instances to predict accurate survival times. In other words, we introduce a third level of supervision in the context of survival analysis, apart from fully observed and censored instances, we also include unlabeled instances. We propose three approaches to deal with this novel setting and provide an empirical comparison over fifteen real-life clinical and gene expression survival datasets. Our results demonstrate that all approaches are able to increase the predictive performance over independent test data. We also show that integrating the partial supervision provided by censored data in a semi-supervised wrapper approach generally provides the best results, often achieving high improvements, compared to not using unlabeled data.
引用
收藏
页码:4139 / 4157
页数:19
相关论文
共 50 条
  • [41] Learning from labeled and unlabeled data
    Kothari, R
    Jain, V
    PROCEEDING OF THE 2002 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-3, 2002, : 2803 - 2808
  • [42] Combining supervised classifiers with unlabeled data
    Xue-yan Liu
    Xue-ying Zhang
    Feng-lian Li
    Li-xia Huang
    Journal of Central South University, 2016, 23 : 1176 - 1182
  • [43] Exploiting Unlabeled Data for Question Classification
    Tomas, David
    Giuliano, Claudio
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, 2011, 6716 : 137 - 144
  • [44] Positive and unlabeled learning in categorical data
    Ienco, Dino
    Pensa, Ruggero G.
    NEUROCOMPUTING, 2016, 196 : 113 - 124
  • [45] Enhancement of breast CADx with unlabeled data
    Jamieson, Andrew R.
    Giger, Maryellen L.
    Drukker, Karen
    Pesce, Lorenzo L.
    MEDICAL PHYSICS, 2010, 37 (08) : 4155 - 4172
  • [46] Identifying mislabeled training data with the aid of unlabeled data
    Donghai Guan
    Weiwei Yuan
    Young-Koo Lee
    Sungyoung Lee
    Applied Intelligence, 2011, 35 : 345 - 358
  • [47] Combining supervised classifiers with unlabeled data
    刘雪艳
    张雪英
    李凤莲
    黄丽霞
    JournalofCentralSouthUniversity, 2016, 23 (05) : 1176 - 1182
  • [48] Unlabeled Data Improves Word Prediction
    Loeff, Nicolas
    Farhadi, Ali
    Endres, Ian
    Forsyth, David A.
    2009 IEEE 12TH INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2009, : 956 - 962
  • [49] Unlabeled Data Improves Adversarial Robustness
    Carmon, Yair
    Raghunathan, Aditi
    Schmidt, Ludwig
    Liang, Percy
    Duchi, John C.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [50] Partial Label Learning with Unlabeled Data
    Wang, Qian-Wei
    Li, Yu-Feng
    Zhou, Zhi-Hua
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 3755 - 3761