Leveraging large language models for medical text classification: a hospital readmission prediction case

被引：0

作者：

Nazyrova, Nodira ^{[1
]}

Chahed, Salma ^{[1
]}

Chausalet, Thierry ^{[1
]}

Dwek, Miriam ^{[2
]}

机构：

[1] Univ Westminster, Sch Comp Sci & Engn, London, England

[2] Univ Westminster, Sch Life Sci, London, England

来源：

2024 14TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION SYSTEMS, ICPRS | 2024年

关键词：

hospital readmission prediction; domain-specific transformer models; BERT; ClinicalBERT; SciBERT; BioBERT; large language models;

D O I：

10.1109/ICPRS62101.2024.10677826

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In recent years, the intersection of natural language processing (NLP) and healthcare informatics has witnessed a revolutionary transformation. One of the most groundbreaking developments in this realm is the advent of large language models (LLM), which have demonstrated remarkable capabilities in analysing clinical data. This paper aims to explore the potential of large language models in medical text classification, shedding light on their ability to discern subtle patterns, grasp domain-specific terminology, and adapt to the dynamic nature of medical information. This research focuses on the application of transformer-based models, such as Bidirectional Encoder Representations from Transformers (BERT), on hospital discharge summaries to predict 30-day readmissions among older adults. In particular, we explore the role of transfer learning in medical text classification and compare domain-specific transformer models, such as SciBERT, BioBERT and ClinicalBERT. We also analyse how data preprocessing techniques affect the performance of language models. Our comparative analysis shows that removing parts of text with a large proportion of out-of-vocabulary words improves the classification results. We also investigate how the input sequence length affects the model performance, varying sequence length from 128 to 512 for BERT-based models and 4096 sequence length for the Longformers. The results of the investigation showed that among compared models SciBERT yields the best performance when applied in the medical domain, improving current hospital readmission predictions using clinical notes on MIMIC data from 0.714 to 0.735 AUROC. Our next step is pretraining a model with a large corpus of clinical notes to potentially improve the adaptability of a language model in the medical domain and achieve better results in downstream tasks.

引用

页数：7

共 50 条

[1] Text Classification via Large Language Models
Sun, Xiaofei
Li, Xiaoya
Li, Jiwei
Wu, Fei
Guo, Shangwei
Zhang, Tianwei
Wang, Guoyin
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 8990 - 9005
[2] Extracting and Encoding: Leveraging Large Language Models and Medical Knowledge to Enhance Radiological Text Representation
Messina, Pablo
Vidal, Rene
Parra, Denis
Soto, Alvaro
Araujo, Vladimir
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 3955 - 3986
[3] Leveraging Large Language Models for Enhanced Classification and Analysis: Fire Incidents Case Study
Alkhammash, Eman H.
FIRE-SWITZERLAND, 2025, 8 (01):
[4] Explanation sensitivity to the randomness of large language models: the case of journalistic text classification
Bogaert, Jérémie
de Marneffe, Marie-Catherine
Descampe, Antonin
Escouflaire, Louis
Fairon, Cédrick
Standaert, François-Xavier
arXiv,
[5] Leveraging Medical Knowledge Graphs Into Large Language Models for Diagnosis Prediction: Design and Application Study
Gao, Yanjun
Li, Ruizhe
Croxford, Emma
Caskey, John
Patterson, Brian W.
Churpek, Matthew
Miller, Timothy
Dligach, Dmitriy
Afshar, Majid
JMIR AI, 2025, 4
[6] Leveraging foundation and large language models in medical artificial intelligence
Wong Io Nam
Monteiro Olivia
BaptistaHon Daniel T
Wang Kai
Lu Wenyang
Sun Zhuo
Nie Sheng
Yin Yun
中华医学杂志英文版, 2024, 137 (21)
[7] Leveraging foundation and large language models in medical artificial intelligence
Wong, Io Nam
Monteiro, Olivia
Baptista-Hon, Daniel T.
Wang, Kai
Lu, Wenyang
Sun, Zhuo
Nie, Sheng
Yin, Yun
CHINESE MEDICAL JOURNAL, 2024, 137 (21) : 2529 - 2539
[8] Mental-LLM: Leveraging Large Language Models for Mental Health Prediction via Online Text Data
Xu, Xuhai
Yao, Bingsheng
Dong, Yuanzhe
Gabriel, Saadia
Yu, Hong
Hendler, James
Ghassemi, Marzyeh
Dey, Anind K.
Wang, Dakuo
PROCEEDINGS OF THE ACM ON INTERACTIVE MOBILE WEARABLE AND UBIQUITOUS TECHNOLOGIES-IMWUT, 2024, 8 (01):
[9] Leveraging Large Language Models for Flexible and Robust Table-to-Text Generation
Oro, Ermelinda
De Grandis, Luca
Granata, Francesco Maria
Ruffolo, Massimo
DATABASE AND EXPERT SYSTEMS APPLICATIONS, PT I, DEXA 2024, 2024, 14910 : 222 - 227
[10] Conformal Prediction and Large Language Models for Medical Coding
Snyder, Christopher
Brodsky, Victor
AMERICAN JOURNAL OF CLINICAL PATHOLOGY, 2024, 162

← 1 2 3 4 5 →