Improving an Electronic Health Record-Based Clinical Prediction Model Under Label Deficiency: Network-Based Generative Adversarial Semisupervised Approach

被引:1
|
作者
Li, Runze [1 ]
Tian, Yu [1 ]
Shen, Zhuyi [1 ]
Li, Jin [2 ]
Li, Jun [3 ]
Ding, Kefeng [3 ]
Li, Jingsong [1 ,4 ]
机构
[1] Zhejiang Univ, Coll Biomed Engn & Instrument Sci, Hangzhou, Peoples R China
[2] Nanjing Univ Informat Sci & Technol, Inst Artificial Intelligence Med, Sch Artificial Intelligence, Nanjing, Peoples R China
[3] Zhejiang Univ, Sch Med, Affiliated Hosp 2, Dept Surg Oncol, Hangzhou, Peoples R China
[4] Zhejiang Univ, Coll Biomed Engn & Instrument Sci, Zhou Yiqing Sci & Technol Bldg.2nd Floor,38 Zheda, Hangzhou 310027, Peoples R China
基金
中国国家自然科学基金;
关键词
semisupervised learning; generative adversarial network; network analysis; label deficiency; clinical prediction; electronic health; record; EHR; adversarial network; data set;
D O I
10.2196/47862
中图分类号
R-058 [];
学科分类号
摘要
Background: Observational biomedical studies facilitate a new strategy for large-scale electronic health record (EHR) utilization to support precision medicine. However, data label inaccessibility is an increasingly important issue in clinical prediction, despite the use of synthetic and semisupervised learning from data. Little research has aimed to uncover the underlying graphical structure of EHRs. Objective: A network-based generative adversarial semisupervised method is proposed. The objective is to train clinical prediction models on label-deficient EHRs to achieve comparable learning performance to supervised methods. Methods: Three public data sets and one colorectal cancer data set gathered from the Second Affiliated Hospital of Zhejiang University were selected as benchmarks. The proposed models were trained on 5% to 25% labeled data and evaluated on classification metrics against conventional semisupervised and supervised methods. The data quality, model security, and memory scalability were also evaluated. Results: The proposed method for semisupervised classification outperforms related semisupervised methods under the same setup, with the average area under the receiver operating characteristics curve (AUC) reaching 0.945, 0.673, 0.611, and 0.588 for the four data sets, respectively, followed by graph-based semisupervised learning (0.450, 0.454, 0.425, and 0.5676, respectively) and label propagation (0.475,0.344, 0.440, and 0.477, respectively). The average classification AUCs with 10% labeled data were 0.929, 0.719, 0.652, and 0.650, respectively, comparable to that of the supervised learning methods logistic regression (0.601, 0.670, 0.731, and 0.710, respectively), support vector machines (0.733, 0.720, 0.720, and 0.721, respectively), and random forests (0.982, 0.750, 0.758, and 0.740, respectively). The concerns regarding the secondary use of data and data security are alleviated by realistic data synthesis and robust privacy preservation.Conclusions: Training clinical prediction models on label-deficient EHRs is indispensable in data-driven research. The proposed method has great potential to exploit the intrinsic structure of EHRs and achieve comparable learning performance to supervised methods.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] Evaluation of Electronic Health Record-Based Suicide Risk Prediction Models on Contemporary Data
    Walker, Rod L.
    Shortreed, Susan M.
    Ziebell, Rebecca A.
    Johnson, Eric
    Boggs, Jennifer M.
    Lynch, Frances L.
    Daida, Yihe G.
    Ahmedani, Brian K.
    Rossom, Rebecca
    Coleman, Karen J.
    Simon, Gregory E.
    APPLIED CLINICAL INFORMATICS, 2021, 12 (04): : 778 - 787
  • [32] Application of a data continuity prediction algorithm to an electronic health record-based pharmacoepidemiology study
    Flory, James H.
    Zhang, Yongkang
    Banerjee, Samprit
    Wang, Fei
    Min, Jea Y.
    Mushlin, Alvin I.
    JOURNAL OF EVALUATION IN CLINICAL PRACTICE, 2024, 30 (04) : 716 - 725
  • [33] A novel generative adversarial network-based super-resolution approach for face recognition
    Chougule, Amit
    Kolte, Shreyas
    Chamola, Vinay
    Hussain, Amir
    EXPERT SYSTEMS, 2024, 41 (08)
  • [34] An Electronic Health Record-Based Strategy to Enhance Detection of Alpha-1 Antitrypsin Deficiency
    Nathani, Avantika
    Stoller, James K.
    RESPIRATORY CARE, 2025, 70 (01) : 74 - 80
  • [35] Generative adversarial network-based glottal waveform model for statistical parametric speech synthesis
    Bollepalli, Bajibabu
    Juvela, Lauri
    Alku, Paavo
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3394 - 3398
  • [36] Generative Adversarial Network-Based Data Augmentation Method for Anti-coronavirus Peptides Prediction
    Xu, Jiliang
    Xu, Chungui
    Cao, Ruifen
    He, Yonghui
    Bin, Yannan
    Zheng, Chun-Hou
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT III, 2023, 14088 : 67 - 76
  • [37] The most common medications dispensed to lactating persons: An electronic health record-based approach
    Palmsten, Kristin
    Vazquez-Benitez, Gabriela
    JaKa, Meghan M.
    Bandoli, Gretchen
    Ahrens, Katherine A.
    Kharbanda, Elyse O.
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2023, 32 (10) : 1113 - 1120
  • [38] A Call for Electronic Health Record-based Data Sharing for Clinical Trials in Critical Care
    Freundlich, Robert E.
    Pandharipande, Pratik
    Ehrenfeld, Jesse M.
    JOURNAL OF MEDICAL SYSTEMS, 2018, 42 (07)
  • [39] A Call for Electronic Health Record-based Data Sharing for Clinical Trials in Critical Care
    Robert E. Freundlich
    Pratik Pandharipande
    Jesse M. Ehrenfeld
    Journal of Medical Systems, 2018, 42
  • [40] Generative adversarial network-based real-time temperature prediction model for heating stage of electric arc furnace
    Li, Chuang
    Mao, Zhizhong
    TRANSACTIONS OF THE INSTITUTE OF MEASUREMENT AND CONTROL, 2022, 44 (08) : 1669 - 1684