The Use of Clinical Language Models Pretrained on Institutional EHR Data for Downstream Tasks

被引:0
|
作者
Suvirat, Kerdkiat [1 ]
Chairat, Sawrawit [1 ]
Horsiritham, Kanakorn [2 ]
Ingviya, Thammasin [3 ]
Kongkamol, Chanon [3 ]
Chaichulee, Sitthichok [1 ]
机构
[1] Prince Songkla Univ, Dept Biomed Sci & Biomed Engn, Fac Med, Hat Yai, Thailand
[2] Prince Songkla Univ, Coll Digital Sci, Hat Yai, Thailand
[3] Prince Songkla Univ, Fac Med, Dept Family & Prevent Med, Hat Yai, Thailand
关键词
natural language processing; language modelling; clinical note; electronic health records; text classification;
D O I
10.1109/JCSSE61278.2024.10613630
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Clinical language models have attracted considerable attention in recent years because of their potential to improve healthcare workflows a nd o ptimise p atient c are. While many pre-trained clinical language models have been published, there are no models specifically f or t he T hai c linical context, where English terminologies are used for diseases, procedures and medications, and Thai language is used for clinical notes. This study investigated the pretraining of different language model architectures, namely RoBERTa, GPT-2 and T5, on the EHR data of Songklanagarind Hospital in Thailand, which includes over 80 million documents. We also investigated the applications of the pretrained model to three downstream clinical tasks: tuberculosis case finding, B IRADS c ategory c lassification an d intraocular pressure extraction. The results indicate that our domain-specific language models performed better than the general-purpose language model, mBERT, and required fewer training examples to achieve the same performance. The study encourages the use of clinical language models to streamline clinical workflows, support clinical research and assist hospital auditing.
引用
收藏
页码:648 / 655
页数:8
相关论文
共 50 条
  • [1] Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning
    Wei, Colin
    Xie, Sang Michael
    Ma, Tengyu
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
  • [2] Evaluation of Pretrained Large Language Models in Embodied Planning Tasks
    Sarkisyan, Christina
    Korchemnyi, Alexandr
    Kovalev, Alexey K.
    Panov, Aleksandr, I
    ARTIFICIAL GENERAL INTELLIGENCE, AGI 2023, 2023, 13921 : 222 - 232
  • [3] Pretrained Models and Evaluation Data for the Khmer Language
    Shengyi Jiang
    Sihui Fu
    Nankai Lin
    Yingwen Fu
    Tsinghua Science and Technology, 2022, 27 (04) : 709 - 718
  • [4] Pretrained models and evaluation data for the Khmer language
    Jiang, Shengyi
    Fu, Sihui
    Lin, Nankai
    Fu, Yingwen
    TSINGHUA SCIENCE AND TECHNOLOGY, 2022, 27 (04) : 709 - 718
  • [5] Do Pretrained Language Models Indeed Understand Software Engineering Tasks?
    Li, Yao
    Zhang, Tao
    Luo, Xiapu
    Cai, Haipeng
    Fang, Sen
    Yuan, Dawei
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2023, 49 (10) : 4639 - 4655
  • [6] Data Augmentation for Spoken Language Understanding via Pretrained Language Models
    Peng, Baolin
    Zhu, Chenguang
    Zeng, Michael
    Gao, Jianfeng
    INTERSPEECH 2021, 2021, : 1219 - 1223
  • [7] Visually-augmented Pretrained Language Models for NLP Tasks without Images
    Guo, Hangyu
    Zhou, Kun
    Zhao, Wayne Xin
    Zhang, Qinyu
    Wen, Ji-Rong
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 14912 - 14929
  • [8] A comparative study of pretrained language models for long clinical text
    Li, Yikuan
    Wehbe, Ramsey M.
    Ahmad, Faraz S.
    Wang, Hanyin
    Luo, Yuan
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2022, : 340 - 347
  • [9] From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models
    Feng, Shangbin
    Park, Chan Young
    Liu, Yuhan
    Tsvetkov, Yulia
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 11737 - 11762
  • [10] Are genomic language models all you need? Exploring genomic language models on protein downstream tasks
    Boshar, Sam
    Trop, Evan
    de Almeida, Bernardo P.
    Copoiu, Liviu
    Pierrot, Thomas
    BIOINFORMATICS, 2024, 40 (09)