The Use of Clinical Language Models Pretrained on Institutional EHR Data for Downstream Tasks

被引:0
|
作者
Suvirat, Kerdkiat [1 ]
Chairat, Sawrawit [1 ]
Horsiritham, Kanakorn [2 ]
Ingviya, Thammasin [3 ]
Kongkamol, Chanon [3 ]
Chaichulee, Sitthichok [1 ]
机构
[1] Prince Songkla Univ, Dept Biomed Sci & Biomed Engn, Fac Med, Hat Yai, Thailand
[2] Prince Songkla Univ, Coll Digital Sci, Hat Yai, Thailand
[3] Prince Songkla Univ, Fac Med, Dept Family & Prevent Med, Hat Yai, Thailand
关键词
natural language processing; language modelling; clinical note; electronic health records; text classification;
D O I
10.1109/JCSSE61278.2024.10613630
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Clinical language models have attracted considerable attention in recent years because of their potential to improve healthcare workflows a nd o ptimise p atient c are. While many pre-trained clinical language models have been published, there are no models specifically f or t he T hai c linical context, where English terminologies are used for diseases, procedures and medications, and Thai language is used for clinical notes. This study investigated the pretraining of different language model architectures, namely RoBERTa, GPT-2 and T5, on the EHR data of Songklanagarind Hospital in Thailand, which includes over 80 million documents. We also investigated the applications of the pretrained model to three downstream clinical tasks: tuberculosis case finding, B IRADS c ategory c lassification an d intraocular pressure extraction. The results indicate that our domain-specific language models performed better than the general-purpose language model, mBERT, and required fewer training examples to achieve the same performance. The study encourages the use of clinical language models to streamline clinical workflows, support clinical research and assist hospital auditing.
引用
收藏
页码:648 / 655
页数:8
相关论文
共 50 条
  • [21] LARGE LANGUAGE MODELS FOR MORTALITY PREDICTION USING STRUCTURED EHR AND UNSTRUCTURED CLINICAL NOTES
    Contreras, Miguel
    Rashidi, Parisa
    Kapoor, Sumit
    CRITICAL CARE MEDICINE, 2025, 53 (01)
  • [22] Exploring the Data Efficiency of Cross-Lingual Post-Training in Pretrained Language Models
    Lee, Chanhee
    Yang, Kisu
    Whang, Taesun
    Park, Chanjun
    Matteson, Andrew
    Lim, Heuiseok
    APPLIED SCIENCES-BASEL, 2021, 11 (05): : 1 - 15
  • [23] Incorporating informatively collected laboratory data from EHR in clinical prediction models
    Sun, Minghui
    Engelhard, Matthew M.
    Bedoya, Armando D.
    Goldstein, Benjamin A.
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2024, 24 (01)
  • [24] A semantic web based framework for the interoperability and exploitation of clinical models and EHR data
    del Carmen Legaz-Garcia, Maria
    Martinez-Costa, Catalina
    Menarguez-Tortosa, Marcos
    Tomas Fernandez-Breis, Jesualdo
    KNOWLEDGE-BASED SYSTEMS, 2016, 105 : 175 - 189
  • [25] Large Language Models for Data Extraction in Slot-Filling Tasks
    Bazan, Marek
    Gniazdowski, Tomasz
    Wolkiewicz, Dawid
    Sarna, Juliusz
    Marchwiany, Maciej E.
    SYSTEM DEPENDABILITY-THEORY AND APPLICATIONS, DEPCOS-RELCOMEX 2024, 2024, 1026 : 1 - 18
  • [26] Efficiency at scale: Investigating the performance of diminutive language models in clinical tasks
    Taylor, Niall
    Ghose, Upamanyu
    Rohanian, Omid
    Nouriborji, Mohammadmahdi
    Kormilitzin, Andrey
    Clifton, David A.
    Nevado-Holgado, Alejo
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2024, 157
  • [27] Assessing EHR Data for Use in Clinical Improvement and Research A practical guide for nurses
    Lyons, Ann M.
    Dimas, Jonathan
    Richardson, Stephanie J.
    Sward, Katherine
    AMERICAN JOURNAL OF NURSING, 2022, 122 (06) : 32 - 41
  • [28] Exploring the Impact of Pretrained Models and Web-Scraped Data for the 2022 NIST Language Recognition Evaluation
    Alumae, Tanel
    Kukk, Kunnar
    Le, Viet-Bac
    Barras, Claude
    Messaoudi, Abdel
    Ben Kheder, Waad
    INTERSPEECH 2023, 2023, : 516 - 520
  • [29] Data-driven brain network models differentiate variability across language tasks
    Bansal, Kanika
    Medaglia, John D.
    Bassett, Danielle S.
    Vettel, Jean M.
    Muldoon, Sarah F.
    PLOS COMPUTATIONAL BIOLOGY, 2018, 14 (10)
  • [30] Understanding Study Participants Views on Co-Creation of Data and Use of EHR in Clinical Studies
    Scott Duncan, Therese
    Hagglund, Maria
    BUILDING CONTINENTS OF KNOWLEDGE IN OCEANS OF DATA: THE FUTURE OF CO-CREATED EHEALTH, 2018, 247 : 341 - 345