Predicting Survival Outcomes in the Presence of Unlabeled Data

被引:3
|
作者
Haredasht, Fateme Nateghi [1 ,2 ,3 ]
Vens, Celine [1 ,2 ,3 ]
机构
[1] Katholieke Univ Leuven, Dept Publ Hlth & Primary Care, Campus KULAK,Etienne Sabbelaan 53, B-8500 Kortrijk, Belgium
[2] IMEC, ITEC, Etienne Sabbelaan 51, B-8500 Kortrijk, Belgium
[3] Katholieke Univ Leuven, Etienne Sabbelaan 51, B-8500 Kortrijk, Belgium
关键词
Survival analysis; Semi-supervised learning; Random survival forest; Self-training; REGRESSION; MODEL;
D O I
10.1007/s10994-022-06257-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many clinical studies require the follow-up of patients over time. This is challenging: apart from frequently observed drop-out, there are often also organizational and financial challenges, which can lead to reduced data collection and, in turn, can complicate subsequent analyses. In contrast, there is often plenty of baseline data available of patients with similar characteristics and background information, e.g., from patients that fall outside the study time window. In this article, we investigate whether we can benefit from the inclusion of such unlabeled data instances to predict accurate survival times. In other words, we introduce a third level of supervision in the context of survival analysis, apart from fully observed and censored instances, we also include unlabeled instances. We propose three approaches to deal with this novel setting and provide an empirical comparison over fifteen real-life clinical and gene expression survival datasets. Our results demonstrate that all approaches are able to increase the predictive performance over independent test data. We also show that integrating the partial supervision provided by censored data in a semi-supervised wrapper approach generally provides the best results, often achieving high improvements, compared to not using unlabeled data.
引用
收藏
页码:4139 / 4157
页数:19
相关论文
共 50 条
  • [31] Introduction to the Analysis of Survival Data in the Presence of Competing Risks
    Austin, Peter C.
    Lee, Douglas S.
    Fine, Jason P.
    CIRCULATION, 2016, 133 (06) : 601 - 609
  • [32] Modelling relative survival in the presence of incomplete data: a tutorial
    Nur, Ula
    Shack, Lorraine G.
    Rachet, Bernard
    Carpenter, James R.
    Coleman, Michel P.
    INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 2010, 39 (01) : 118 - 128
  • [33] Variable screening for survival data in the presence of heterogeneous censoring
    Xu, Jinfeng
    Li, Wai Keung
    Ying, Zhiliang
    SCANDINAVIAN JOURNAL OF STATISTICS, 2020, 47 (04) : 1171 - 1191
  • [34] The use of unlabeled data in predictive modeling
    Liang, Feng
    Mukherjee, Sayan
    West, Mike
    STATISTICAL SCIENCE, 2007, 22 (02) : 189 - 205
  • [35] BASED METHOD FOR HANDLING UNLABELED DATA
    Alvarez Gomez, Sharon Diznarda
    Machuca Vivar, Silvio Amable
    Salas Medina, Paulina Elizabeth
    REVISTA UNIVERSIDAD Y SOCIEDAD, 2021, 13 : 452 - 458
  • [36] Combining supervised classifiers with unlabeled data
    Liu Xue-yan
    Zhang Xue-ying
    Li Feng-lian
    Huang Li-xia
    JOURNAL OF CENTRAL SOUTH UNIVERSITY, 2016, 23 (05) : 1176 - 1182
  • [37] Storage Fit Learning with Unlabeled Data
    Hou, Bo-Jian
    Zhang, Lijun
    Zhou, Zhi-Hua
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 1844 - 1850
  • [38] Federated Learning with Positive and Unlabeled Data
    Lin, Xinyang
    Chen, Hanting
    Xu, Yixing
    Xu, Chao
    Gui, Xiaolin
    Deng, Yiping
    Wang, Yunhe
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [39] Accelerating Exploration with Unlabeled Prior Data
    Li, Qiyang
    Zhang, Jason
    Ghosh, Dibya
    Zhang, Amy
    Levine, Sergey
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36, NEURIPS 2023, 2023,
  • [40] Using unlabeled data for supervised learning
    Towell, G
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 8: PROCEEDINGS OF THE 1995 CONFERENCE, 1996, 8 : 647 - 653