The Use of Clinical Language Models Pretrained on Institutional EHR Data for Downstream Tasks

被引:0
|
作者
Suvirat, Kerdkiat [1 ]
Chairat, Sawrawit [1 ]
Horsiritham, Kanakorn [2 ]
Ingviya, Thammasin [3 ]
Kongkamol, Chanon [3 ]
Chaichulee, Sitthichok [1 ]
机构
[1] Prince Songkla Univ, Dept Biomed Sci & Biomed Engn, Fac Med, Hat Yai, Thailand
[2] Prince Songkla Univ, Coll Digital Sci, Hat Yai, Thailand
[3] Prince Songkla Univ, Fac Med, Dept Family & Prevent Med, Hat Yai, Thailand
关键词
natural language processing; language modelling; clinical note; electronic health records; text classification;
D O I
10.1109/JCSSE61278.2024.10613630
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Clinical language models have attracted considerable attention in recent years because of their potential to improve healthcare workflows a nd o ptimise p atient c are. While many pre-trained clinical language models have been published, there are no models specifically f or t he T hai c linical context, where English terminologies are used for diseases, procedures and medications, and Thai language is used for clinical notes. This study investigated the pretraining of different language model architectures, namely RoBERTa, GPT-2 and T5, on the EHR data of Songklanagarind Hospital in Thailand, which includes over 80 million documents. We also investigated the applications of the pretrained model to three downstream clinical tasks: tuberculosis case finding, B IRADS c ategory c lassification an d intraocular pressure extraction. The results indicate that our domain-specific language models performed better than the general-purpose language model, mBERT, and required fewer training examples to achieve the same performance. The study encourages the use of clinical language models to streamline clinical workflows, support clinical research and assist hospital auditing.
引用
收藏
页码:648 / 655
页数:8
相关论文
共 50 条
  • [31] BEYOND BIOMARKERS: MINING CLINICAL LAB DATA FROM THE EHR FOR USE IN PSYCHIATRIC GENOMIC ANALYSIS
    Davis, Lea
    Dennis, Jessica
    EUROPEAN NEUROPSYCHOPHARMACOLOGY, 2019, 29 : 1052 - 1052
  • [32] Large language models are less effective at clinical prediction tasks than locally trained machine learning models
    Brown, Katherine E.
    Yan, Chao
    Li, Zhuohang
    Zhang, Xinmeng
    Collins, Benjamin X.
    Chen, You
    Clayton, Ellen Wright
    Kantarcioglu, Murat
    Vorobeychik, Yevgeniy
    Malin, Bradley A.
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2025,
  • [33] The use of hidden semi-Markov models in clinical diagnosis maze tasks
    Marhasev, Einat
    Hadad, Meirav
    Kaminka, Gal A.
    Feintuch, Uri
    INTELLIGENT DATA ANALYSIS, 2009, 13 (06) : 943 - 967
  • [34] Contrastive Language-Vision AI Models Pretrained on Web-Scraped Multimodal Data Exhibit Sexual Objectification Bias
    Wolfe, Robert
    Yang, Yiwei
    Howe, Bill
    Caliskan, Aylin
    PROCEEDINGS OF THE 6TH ACM CONFERENCE ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY, FACCT 2023, 2023, : 1174 - 1185
  • [35] The clinical use of prediction models -: one year data
    Wikland, Kerstin Albertsson
    Kristrom, Bent
    Dahlgren, Jovanna
    HORMONE RESEARCH, 2006, 65 : 160 - 160
  • [36] Use of natural language processed clinical notes in an electronic health record (EHR) to characterize adverse events of mood disorders
    Huang, Hsiao-Ching
    Huisingh, Carrie
    Missmer, Stacey
    Hinman-Mcllroy, Brenda
    Chiuve, Stephanie
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2022, 31 : 629 - 630
  • [37] Optimizing data integration in trials that use EHR data: lessons learned from a multi-center randomized clinical trial
    Sudha R. Raman
    Laura G. Qualls
    Bradley G. Hammill
    Adam J. Nelson
    Ester Kim Nilles
    Keith Marsolo
    Emily C. O’Brien
    Trials, 24
  • [38] An Examination of the Use of Large Language Models to Aid Analysis of Textual Data
    Tai, Robert H.
    Bentley, Lillian R.
    Xia, Xin
    Sitt, Jason M.
    Fankhauser, Sarah C.
    Chicas-Mosier, Ana M.
    Monteith, Barnas G.
    INTERNATIONAL JOURNAL OF QUALITATIVE METHODS, 2024, 23
  • [39] USE OF GEOGRAPHICAL META-DATA IN ASR LANGUAGE AND ACOUSTIC MODELS
    Bocchieri, Enrico
    Caseiro, Diamantino
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5118 - 5121
  • [40] Identifying Strategy Use in Category Learning Tasks: A Case for More Diagnostic Data and Models
    Donkin, Chris
    Newell, Ben R.
    Kalish, Mike
    Dunn, John C.
    Nosofsky, Robert M.
    JOURNAL OF EXPERIMENTAL PSYCHOLOGY-LEARNING MEMORY AND COGNITION, 2015, 41 (04) : 933 - 948