Improving an Electronic Health Record-Based Clinical Prediction Model Under Label Deficiency: Network-Based Generative Adversarial Semisupervised Approach

被引:1
|
作者
Li, Runze [1 ]
Tian, Yu [1 ]
Shen, Zhuyi [1 ]
Li, Jin [2 ]
Li, Jun [3 ]
Ding, Kefeng [3 ]
Li, Jingsong [1 ,4 ]
机构
[1] Zhejiang Univ, Coll Biomed Engn & Instrument Sci, Hangzhou, Peoples R China
[2] Nanjing Univ Informat Sci & Technol, Inst Artificial Intelligence Med, Sch Artificial Intelligence, Nanjing, Peoples R China
[3] Zhejiang Univ, Sch Med, Affiliated Hosp 2, Dept Surg Oncol, Hangzhou, Peoples R China
[4] Zhejiang Univ, Coll Biomed Engn & Instrument Sci, Zhou Yiqing Sci & Technol Bldg.2nd Floor,38 Zheda, Hangzhou 310027, Peoples R China
基金
中国国家自然科学基金;
关键词
semisupervised learning; generative adversarial network; network analysis; label deficiency; clinical prediction; electronic health; record; EHR; adversarial network; data set;
D O I
10.2196/47862
中图分类号
R-058 [];
学科分类号
摘要
Background: Observational biomedical studies facilitate a new strategy for large-scale electronic health record (EHR) utilization to support precision medicine. However, data label inaccessibility is an increasingly important issue in clinical prediction, despite the use of synthetic and semisupervised learning from data. Little research has aimed to uncover the underlying graphical structure of EHRs. Objective: A network-based generative adversarial semisupervised method is proposed. The objective is to train clinical prediction models on label-deficient EHRs to achieve comparable learning performance to supervised methods. Methods: Three public data sets and one colorectal cancer data set gathered from the Second Affiliated Hospital of Zhejiang University were selected as benchmarks. The proposed models were trained on 5% to 25% labeled data and evaluated on classification metrics against conventional semisupervised and supervised methods. The data quality, model security, and memory scalability were also evaluated. Results: The proposed method for semisupervised classification outperforms related semisupervised methods under the same setup, with the average area under the receiver operating characteristics curve (AUC) reaching 0.945, 0.673, 0.611, and 0.588 for the four data sets, respectively, followed by graph-based semisupervised learning (0.450, 0.454, 0.425, and 0.5676, respectively) and label propagation (0.475,0.344, 0.440, and 0.477, respectively). The average classification AUCs with 10% labeled data were 0.929, 0.719, 0.652, and 0.650, respectively, comparable to that of the supervised learning methods logistic regression (0.601, 0.670, 0.731, and 0.710, respectively), support vector machines (0.733, 0.720, 0.720, and 0.721, respectively), and random forests (0.982, 0.750, 0.758, and 0.740, respectively). The concerns regarding the secondary use of data and data security are alleviated by realistic data synthesis and robust privacy preservation.Conclusions: Training clinical prediction models on label-deficient EHRs is indispensable in data-driven research. The proposed method has great potential to exploit the intrinsic structure of EHRs and achieve comparable learning performance to supervised methods.
引用
收藏
页数:14
相关论文
共 50 条
  • [11] Generative adversarial network-based atmospheric scattering model for image dehazing
    Zhu, Jinxiu
    Meng, Leilei
    Wu, Wenxia
    Choi, Dongmin
    Ni, Jianjun
    DIGITAL COMMUNICATIONS AND NETWORKS, 2021, 7 (02) : 178 - 186
  • [12] A prediction model of vessel trajectory based on generative adversarial network
    Wang, Senjie
    He, Zhengwei
    JOURNAL OF NAVIGATION, 2021, 74 (05): : 1161 - 1171
  • [13] Pharmacist Hypertension Management Using an Electronic Health Record-Based Approach
    Soreide, Kristin K.
    Solomon, Octavia
    Farhat, Nada M.
    Kolander, Sarah
    Gottschall, Terry
    George, Diane L.
    Szandzik, Edward G.
    Kalus, James S.
    Thomas, Emily
    AMERICAN JOURNAL OF MANAGED CARE, 2022, 28 (04): : E121 - E125
  • [14] Test collections for electronic health record-based clinical information retrieval
    Wang, Yanshan
    Wen, Andrew
    Liu, Sijia
    Hersh, William
    Bedrick, Steven
    Liu, Hongfang
    JAMIA OPEN, 2019, 2 (03) : 360 - 368
  • [15] Improving hydraulic conductivity prediction of bentonite using machine learning with generative adversarial network-based data augmentation
    Shi, Xiaoqiong
    Zhang, Pengfei
    Feng, Jiaxing
    Xu, Ke
    Fang, Ziluo
    Tian, Junlei
    Wu, Tao
    CONSTRUCTION AND BUILDING MATERIALS, 2025, 462
  • [16] An Imbalanced Generative Adversarial Network-Based Approach for Network Intrusion Detection in an Imbalanced Dataset
    Rao, Yamarthi Narasimha
    Babu, Kunda Suresh
    SENSORS, 2023, 23 (01)
  • [17] Improving Generative Adversarial Network-based Vocoding through Multi-scale Convolution
    Li, Wanting
    Chen, Yiting
    Tang, Buzhou
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (09)
  • [18] Generative adversarial network-based approach to signal reconstruction from magnitude spectrogram
    Oyamada, Keisuke
    Kameoka, Hirokazu
    Kaneko, Takuhiro
    Tanaka, Kou
    Hojo, Nobukatsu
    Ando, Hiroyasu
    2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 2514 - 2518
  • [19] A Novel Generative Adversarial Network-Based Approach for Automated Brain Tumour Segmentation
    Sille, Roohi
    Choudhury, Tanupriya
    Sharma, Ashutosh
    Chauhan, Piyush
    Tomar, Ravi
    Sharma, Durgansh
    MEDICINA-LITHUANIA, 2023, 59 (01):
  • [20] A generative adversarial network-based framework for network-wide travel time reliability prediction
    Shao, Feng
    Shao, Hu
    Wang, Dongle
    Lam, William H.K.
    Tam, Mei Lam
    Knowledge-Based Systems, 2024, 283