Improving an Electronic Health Record-Based Clinical Prediction Model Under Label Deficiency: Network-Based Generative Adversarial Semisupervised Approach

被引:1
|
作者
Li, Runze [1 ]
Tian, Yu [1 ]
Shen, Zhuyi [1 ]
Li, Jin [2 ]
Li, Jun [3 ]
Ding, Kefeng [3 ]
Li, Jingsong [1 ,4 ]
机构
[1] Zhejiang Univ, Coll Biomed Engn & Instrument Sci, Hangzhou, Peoples R China
[2] Nanjing Univ Informat Sci & Technol, Inst Artificial Intelligence Med, Sch Artificial Intelligence, Nanjing, Peoples R China
[3] Zhejiang Univ, Sch Med, Affiliated Hosp 2, Dept Surg Oncol, Hangzhou, Peoples R China
[4] Zhejiang Univ, Coll Biomed Engn & Instrument Sci, Zhou Yiqing Sci & Technol Bldg.2nd Floor,38 Zheda, Hangzhou 310027, Peoples R China
基金
中国国家自然科学基金;
关键词
semisupervised learning; generative adversarial network; network analysis; label deficiency; clinical prediction; electronic health; record; EHR; adversarial network; data set;
D O I
10.2196/47862
中图分类号
R-058 [];
学科分类号
摘要
Background: Observational biomedical studies facilitate a new strategy for large-scale electronic health record (EHR) utilization to support precision medicine. However, data label inaccessibility is an increasingly important issue in clinical prediction, despite the use of synthetic and semisupervised learning from data. Little research has aimed to uncover the underlying graphical structure of EHRs. Objective: A network-based generative adversarial semisupervised method is proposed. The objective is to train clinical prediction models on label-deficient EHRs to achieve comparable learning performance to supervised methods. Methods: Three public data sets and one colorectal cancer data set gathered from the Second Affiliated Hospital of Zhejiang University were selected as benchmarks. The proposed models were trained on 5% to 25% labeled data and evaluated on classification metrics against conventional semisupervised and supervised methods. The data quality, model security, and memory scalability were also evaluated. Results: The proposed method for semisupervised classification outperforms related semisupervised methods under the same setup, with the average area under the receiver operating characteristics curve (AUC) reaching 0.945, 0.673, 0.611, and 0.588 for the four data sets, respectively, followed by graph-based semisupervised learning (0.450, 0.454, 0.425, and 0.5676, respectively) and label propagation (0.475,0.344, 0.440, and 0.477, respectively). The average classification AUCs with 10% labeled data were 0.929, 0.719, 0.652, and 0.650, respectively, comparable to that of the supervised learning methods logistic regression (0.601, 0.670, 0.731, and 0.710, respectively), support vector machines (0.733, 0.720, 0.720, and 0.721, respectively), and random forests (0.982, 0.750, 0.758, and 0.740, respectively). The concerns regarding the secondary use of data and data security are alleviated by realistic data synthesis and robust privacy preservation.Conclusions: Training clinical prediction models on label-deficient EHRs is indispensable in data-driven research. The proposed method has great potential to exploit the intrinsic structure of EHRs and achieve comparable learning performance to supervised methods.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] SMAPGAN: Generative Adversarial Network-Based Semisupervised Styled Map Tile Generation Method
    Chen, Xu
    Chen, Songqiang
    Xu, Tian
    Yin, Bangguo
    Peng, Jian
    Mei, Xiaoming
    Li, Haifeng
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2021, 59 (05): : 4388 - 4406
  • [2] Generative Adversarial Network-Based Intra Prediction for Video Coding
    Zhu, Linwei
    Kwong, Sam
    Zhang, Yun
    Wang, Shiqi
    Wang, Xu
    IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (01) : 45 - 58
  • [3] Improving the Effectiveness of Electronic Health Record-Based Referral Processes
    Esquivel, Adol
    Sittig, Dean F.
    Murphy, Daniel R.
    Singh, Hardeep
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2012, 12
  • [4] Improving the Effectiveness of Electronic Health Record-Based Referral Processes
    Adol Esquivel
    Dean F Sittig
    Daniel R Murphy
    Hardeep Singh
    BMC Medical Informatics and Decision Making, 12
  • [5] A Generative Adversarial Network-Based Approach for Facial Pain Assessment
    Wang, Leilu
    Wang, Zunliang
    Xu, Ao
    Liu, Songqiao
    2024 8TH INTERNATIONAL CONFERENCE ON BIOMEDICAL ENGINEERING AND APPLICATIONS, ICBEA 2024, 2024, : 44 - 49
  • [6] AGASI: A Generative Adversarial Network-Based Approach to Strengthening Adversarial Image Steganography
    Fan, Haiju
    Jin, Changyuan
    Li, Ming
    ENTROPY, 2025, 27 (03)
  • [7] Generative adversarial network-based data augmentation for improving hypoglycemia prediction: A proof-of-concept study
    Seo, Wonju
    Kim, Namho
    Park, Sung-Woon
    Jin, Sang -Man
    Park, Sung -Min
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 92
  • [8] A Generative Adversarial Network-Based Fault Detection Approach for Photovoltaic Panel
    Lu, Fangfang
    Niu, Ran
    Zhang, Zhihao
    Guo, Lingling
    Chen, Jingjing
    APPLIED SCIENCES-BASEL, 2022, 12 (04):
  • [9] Screening for undiagnosed atrial fibrillation using an electronic health record-based clinical prediction model: clinical pilot implementation initiative
    Grout, Randall W.
    Ateya, Mohammad
    Direnzo, Baely
    Hart, Sara
    King, Chase
    Rajkumar, Joshua
    Sporrer, Susan
    Torabi, Asad
    Walroth, Todd A.
    Kovacs, Richard J.
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2024, 24 (01)
  • [10] Generative adversarial network-based atmospheric scattering model for image dehazing
    Jinxiu Zhu
    Leilei Meng
    Wenxia Wu
    Dongmin Choi
    Jianjun Ni
    Digital Communications and Networks, 2021, 7 (02) : 178 - 186